Abstract:
Occupying the most significant portion of global data traffic, video is being generated in almost every aspect of our life. Because of its huge volume, we are depending m...View moreMetadata
Abstract:
Occupying the most significant portion of global data traffic, video is being generated in almost every aspect of our life. Because of its huge volume, we are depending much more heavily on machine intelligence based analysis. In the meantime, video coding technology has been continuously improved for better compression efficiency. However, the state-of-the-art video coding standards, such as H.265/HEVC and versatile video coding (VVC), are still designed assuming that the compressed video will be watched by a human later. Such a design is not optimal when the compressed video will be used by computer vision applications. While the human visual system (HVS) is consistently sensitive to the content with high contrast, the impact of pixels on computer vision algorithms is task driven. For example, because of the different categories of objects used to train detection algorithms, the influence of the same image content on those detectors also varies. Therefore, human oriented video coding strategies may not be optimal when the compressed signal is further processed by algorithms, as the encoder is unaware of the task specific information. In this article, taking object detection as an example, we propose a novel video coding strategy for computer vision. By protecting the information according to its importance for an object detector rather than for the human visual system, our proposed method has the potential to achieve a better object detection performance with the same bandwidth. The main contributions of our paper are: 1) the modeling of the relationship between object detection accuracy and bit rate; 2) a back propagation based method to analyze the influence of each pixel on the detection of target objects; 3) an object detection oriented bit allocation and codec control parameter determination scheme; 4) an evaluation metric to compare the impact of video coding strategies on a given object detector over a predefined range of bit rate. Experimental results demo...
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 31, Issue: 12, December 2021)