I. Introduction
Recent years have witnessed an exponential increase in the amount of image/video data due to the rapid development of various multimedia applications. Consequently, the highly efficient compression of images and videos has remained a fundamental challenge in multimedia communication and processing for decades. Early on, images and videos were primarily intended for human viewing and entertainment. As machine vision technologies advance, growing visual data are required to analyze for intelligent applications, imposing new challenges to machine vision-oriented data compression. The demands of human vision and machine analysis in terms of compression differ fundamentally. The traditional image compression paradigm for human vision aims to maintain signal fidelity as much as possible under the constraint of the bit rate budget. In machine vision, retaining and compressing compact features that contain sufficient semantic information for the associated analysis task is commonly practiced. Both above coding paradigms are well-suited to one vision only but not the other. In particular, the image compression paradigm cannot guarantee the preservation of semantic information of specific tasks in low-bitrate coding scenarios, which compromises machine analysis efficiency. Despite the compact feature being sufficient to support the corresponding vision task, it cannot be reconstructed into visual signals due to the large amount of information lost. Accordingly, a universal compression scheme that can well serve both human and machine visions is highly desirable [1].