Abstract:
Several recent works seek to create lightweight deep net-works for video object detection on mobiles. We observe that many existing detectors, previously deemed computati...Show MoreMetadata
Abstract:
Several recent works seek to create lightweight deep net-works for video object detection on mobiles. We observe that many existing detectors, previously deemed computationally costly for mobiles, intrinsically support adaptive inference, and offer a multi-branch object detection frame-work (MBODF). Here, an MBODF is referred to as a so-lution that has many execution branches and one can dy-namically choose from among them at inference time to sat-isfy varying latency requirements (e.g. by varying resolution of an input frame). In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to sched-ule the optimal one at inference time? In addition, we un-cover the importance of making a content-aware decision on which branch to run, as the optimal one is conditioned on the video content. Finally, we explore a content-aware scheduler, an Oracle one, and then a practical one, leveraging various lightweight feature extractors. Our evaluation shows that layered on Faster R-CNN-based MBODF, compared to 7 baselines, our Smartadapt achieves a higher Pareto optimal curve in the accuracy-vs-latency space for the ILSVRC VID dataset.
Date of Conference: 18-24 June 2022
Date Added to IEEE Xplore: 27 September 2022
ISBN Information: