Loading [MathJax]/extensions/MathMenu.js
Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection | IEEE Journals & Magazine | IEEE Xplore

Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection


Abstract:

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented ...Show More

Abstract:

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.
Page(s): 1452 - 1459
Date of Publication: 18 February 2020

ISSN Information:

PubMed ID: 32086194

Funding Agency:

References is not available for this document.

1 Introduction

Object detection has achieved a considerable progress thanks to convolutional neural networks (CNNs). The state-of-the-art methods [1], [2], [3] usually aim to detect objects via regressing horizontal bounding boxes. Yet multi-oriented objects are ubiquitous in many scenarios. Examples are objects in aerial images and scene texts. Horizontal bounding box does not provide accurate orientation and scale information, which poses problem in real applications such as object change detection in aerial images and recognition of sequential characters for multi-oriented scene texts.

Select All
1.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
2.
J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” 2018, preprint: 1804.02767.
3.
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2117–2125.
4.
X. Zhou, et al., “EAST: An efficient and accurate scene text detector,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2642–2651.
5.
J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for oriented object detection in aerial images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2849–2858.
6.
M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Trans. Image Process., vol. 27, no. 8, pp. 3676–3690, Aug. 2018.
7.
W. He, X.-Y. Zhang, F. Yin, and C.-L. Liu, “Multi-oriented and multi-lingual scene text detection with direct regression,” IEEE Trans. Image Process., vol. 27, no. 11, pp. 5406–5419, Nov. 2018.
8.
C. Zhang, et al., “Look more than once: An accurate detector for text of arbitrary shapes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 552–10 561.
9.
B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3482–3490.
10.
P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai, “Multi-oriented scene text detection via corner localization and region segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7553–7563.
11.
Z. Liu, G. Lin, S. Yang, J. Feng, W. Lin, and W. L. Goh, “Learning markov clustering networks for scene text detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6936–6944.
12.
Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, “Multi-oriented text detection with fully convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4159–4167.
13.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587.
14.
R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448.
15.
J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” in Proc. Advances Neural Inf. Process. Syst., 2016, pp. 379–387.
16.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779–788.
17.
J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7263–7271.
18.
W. Liu, et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 21–37.
19.
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988.
20.
H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 734–750.
21.
X. Zhou, J. Zhuo, and P. Krahenbuhl, “Bottom-up object detection by grouping extreme and center points,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 850–859.
22.
K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in Proc. IEEE Conf. Comp. Vis. Pattern Recognit., 2019, pp. 6569–6578.
23.
G.-S. Xia, et al., “DOTA: A large-scale dataset for object detection in aerial images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3974–3983.
24.
L. Liu, Z. Pan, and B. Lei, “Learning a rotation invariant detector with rotatable bounding box,” 2017, arxiv: 1711.09405.
25.
Z. Zhang, W. Guo, S. Zhu, and W. Yu, “Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks,” IEEE Geosci. Remote Sens. Lett., vol. vol. 15, no. 11, pp. 1745–1749, Nov. 2018.
26.
S. M. Azimi, E. Vig, R. Bahmanyar, M. Körner, and P. Reinartz, “Towards multi-class object detection in unconstrained remote sensing imagery,” in Proc. Asian Conf. Comput. Vis., 2018, pp. 150–165.
27.
X. Yang, et al., “R2CNN++: Multi-dimensional attention based rotation invariant detector with robust anchor strategy,” 2018, arxiv: 1811.07126.
28.
G. Zhang, S. Lu, and W. Zhang, “CAD-Net: A context-aware detection net-570 work for objects in remote sensing imagery,” IEEE Trans. Geosci. Remote Sensing, vol. 57, no. 12, pp. 10015–10024, 2019.
29.
Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang, and X. Bai, “Textfield: Learning a deep direction field for irregular scene text detection,” IEEE Trans. Image Process., vol. 28, no. 11, pp. 5566–5579, Nov. 2019.
30.
J. Ma, et al., “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Trans. Multimedia, vol. 20, no. 11, pp. 3111–3122, Nov. 2018.

Contact IEEE to Subscribe

References

References is not available for this document.