Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision | IEEE Journals & Magazine | IEEE Xplore

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision


Abstract:

The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigan...Show More

Abstract:

The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigantic scale, which shall be effectively represented to satisfy both human and machine requirements. In this work, we investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers, supporting machine intelligence and human visual perception in a progressive fashion. With the aim of achieving efficient compression, we propose the layer-wise scalable entropy transformer to reduce the redundancy between layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio. We validate the proposed paradigm’s feasibility in face image compression. Extensive qualitative and quantitative experimental results demonstrate the superiority of the proposed paradigm over the latest compression standard Versatile Video Coding (VVC) in terms of both machine analysis as well as human perception at extremely low bitrates (< 0.01 bpp), offering new insights for human-machine collaborative compression.
Published in: IEEE Transactions on Image Processing ( Volume: 33)
Page(s): 408 - 422
Date of Publication: 22 December 2023

ISSN Information:

PubMed ID: 38133987

Funding Agency:


I. Introduction

Recent years have witnessed an exponential increase in the amount of image/video data due to the rapid development of various multimedia applications. Consequently, the highly efficient compression of images and videos has remained a fundamental challenge in multimedia communication and processing for decades. Early on, images and videos were primarily intended for human viewing and entertainment. As machine vision technologies advance, growing visual data are required to analyze for intelligent applications, imposing new challenges to machine vision-oriented data compression. The demands of human vision and machine analysis in terms of compression differ fundamentally. The traditional image compression paradigm for human vision aims to maintain signal fidelity as much as possible under the constraint of the bit rate budget. In machine vision, retaining and compressing compact features that contain sufficient semantic information for the associated analysis task is commonly practiced. Both above coding paradigms are well-suited to one vision only but not the other. In particular, the image compression paradigm cannot guarantee the preservation of semantic information of specific tasks in low-bitrate coding scenarios, which compromises machine analysis efficiency. Despite the compact feature being sufficient to support the corresponding vision task, it cannot be reconstructed into visual signals due to the large amount of information lost. Accordingly, a universal compression scheme that can well serve both human and machine visions is highly desirable [1].

Contact IEEE to Subscribe

References

References is not available for this document.