Journals & Magazines >IEEE Transactions on Image Pr... >Volume: 30

Towards Fine-Grained Human Pose Transfer With Detail Replenishing Network

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applica...Show More

Metadata

Abstract:

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applications, the visual realism of fine-grained appearance details is crucial for production quality and user engagement. However, existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency, which severely degrade the visual quality and realism of generated images. Aiming towards real-world applications, we develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment. Concretely, we analyze the potential design flaws of existing methods via an illustrative example, and establish the core FHPT methodology by combing the idea of content synthesis and feature transfer together in a mutually-guided fashion. Thereafter, we substantiate the proposed methodology with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine model training scheme. Moreover, we build up a complete suite of fine-grained evaluation protocols to address the challenges of FHPT in a comprehensive manner, including semantic analysis, structural detection and perceptual quality assessment. Extensive experiments on the DeepFashion benchmark dataset have verified the power of proposed benchmark against start-of-the-art works, with 12%–14% gain on top-10 retrieval recall, 5% higher joint localization accuracy, and near 40% gain on face identity preservation. Our codes, models and evaluation tools will be released at https://github.com/Lotayou/RATE

Published in: IEEE Transactions on Image Processing ( Volume: 30)

Page(s): 2422 - 2435

Date of Publication: 25 January 2021

ISSN Information:

PubMed ID: 33493117

DOI: 10.1109/TIP.2021.3052364

Funding Agency:

Lingbo Yang

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Lingbo Yang received the B.S. degree in mathematics and applied mathematics from Peking University in 2016. He is currently pursuing the Ph.D. degree with the Institute of Digital Media, Peking University. He has been interning at the DAMO Academy, Alibaba Group, since 2019. His research interests include deep generative models, image restoration and editing, and human pose transfer. In 2020, he has authored six peer-revi...Show More

Pan Wang

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Pan Wang received the B.S. degree in information and control engineering from the China University of Petroleum, Qingdao, China, in 2013, and the M.S. degree in computer science with the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China, in 2018. He is currently a Computer Vision Algorithm Engineer with the Alibaba DAMO academy. His current research i...Show More

Chang Liu

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

Chang Liu received the B.S. degree from Jilin University, Jilin, China, in 2012. He is currently pursuing the Ph.D. degree with the School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China. His research interests include computer vision and machine learning, specifically for neural architecture design and visual object detection. He has published more than t...Show More

Zhanning Gao

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Zhanning Gao received the B.S. degree in automatic control engineering from Xi’an Jiaotong University, Xi’an, China, in 2012. He is currently pursuing the Ph.D. degree with the Institute of Artificial Intelligence and Robtics, Xi’an Jiaotong University. He was a Research Intern with Visual Computing Group, Microsoft Research Asia, from 2015 to 2017. His research interests include compact image/video representation, large ...Show More

Peiran Ren

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Peiran Ren received the B.Sc. and Ph.D. degrees from Tsinghua University, China, in 2008 and 2014 respectively. He is currently a Senior Algorithm Engineer with the Alibaba Damo Acadamy. His research interests include image and video enhancement and processing, computer aided design, real-time rendering, and appearance acquisition.

Xinfeng Zhang

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

Shanshe Wang

Institute of Digital Media, Peking University, Beijing, China

Siwei Ma

Institute of Digital Media, Peking University, Beijing, China

Xiansheng Hua

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Wen Gao

Institute of Digital Media, Peking University, Beijing, China

Contents

Lingbo Yang

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Pan Wang

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Chang Liu

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

Zhanning Gao

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Peiran Ren

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Xinfeng Zhang

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

Xinfeng Zhang received the B.Sc. degree from the Hebei University of Technology, in 2007, and the Ph.D. degree from the Chinese Academy of Sciences, in 2014. He served as a research fellow/postdoc in Nanyang Technological University, University of Southern California, and City University of Hong Kong. He is currently an Assistant Professor with the Department of Computer Science, University of Chinese Academy of Sciences. He has authored 20 technical proposals to ISO/MPEG, ITU-T, and AVS standards and more than 100 refereed journals/conference papers. His research interests include video compression, image/video quality assessment, and image/video analysis. He received the Best Paper Award at the 2017 Pacific-Rim Conference on Multimedia, the Best Paper Award of IEEE Multimedia 2018, and is the coauthor of a paper that received the Best Student Paper Award in the IEEE International Conference on Image Processing 2018.

Shanshe Wang

Institute of Digital Media, Peking University, Beijing, China

Shanshe Wang (Member, IEEE) received the B.S. degree from the Department of Mathematics, Heilongjiang University, Harbin, China, in 2004, the M.S. degree in computer software and theory from Northeast Petroleum University, Daqing, China, in 2010, and the Ph.D. degree in computer science from the Harbin Institute of Technology. He held a postdoctoral position with Peking University, from 2016 to 2018. He joined the School of Electronics Engineering and Computer Science, Institute of Digital Media, Peking University, Beijing, where he is currently a Research Assistant Professor. His current research interests include video compression and image and video quality assessment.

Siwei Ma

Institute of Digital Media, Peking University, Beijing, China

Siwei Ma (Member, IEEE) received the B.Sc. degree from Shandong Normal University, in 1999, and the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, in 2005. He worked as a postdoc with the University of Southern California, from 2005 to 2007. He joined the Institute of Digital Media, Peking University, where he is currently a Professor. He has authored over 200 technical articles in refereed journals and proceedings in image and video coding, video processing, video streaming, and transmission. He is an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology and the Journal of Visual Communication and Image Representation.

Xiansheng Hua

Video Coding Laboratory, Institute of Digital Media, Peking University (PKU-IDM-VCL), Beijing, China

Xiansheng Hua (Fellow, IEEE) received the B.S. and Ph.D. degrees in applied mathematics from Peking University, Beijing, in 1996 and 2001, respectively. In 2001, he joined Microsoft Research Asia as a Researcher. He became a Researcher and the Senior Director of the Alibaba Group in 2015. He has authored or coauthored over 250 research articles and has filed over 90 patents. His research interests have been in the areas of multimedia search, advertising, understanding, and mining, and pattern recognition and machine learning. He was honored as one of the recipients of MIT35. He served as a Program Co-Chair for the IEEE ICME 2013, the ACM Multimedia 2012, and the IEEE ICME 2012, and on the Technical Directions Board of the IEEE Signal Processing Society. He is an ACM Distinguished Scientist.

Wen Gao

Institute of Digital Media, Peking University, Beijing, China

Wen Gao (Fellow, IEEE) received the Ph.D. degree in electronic engineering from The University of Tokyo in 1991. He was a Professor of computer science with the Harbin Institute of Technology from 1991 to 1995 and with the Institute of Computing Technology, Chinese Academy of Sciences, from 1996 to 2006. He is currently a Professor of computer science with Peking University. He has authored extensively, including five books and more than 600 technical articles in refereed journals and conference proceedings in the areas of image processing, video coding and communication, pattern recognition, multimedia information retrieval, multimodal interface, and bioinformatics. He chaired a number of prestigious international conferences on multimedia and video signal processing, such as IEEE ISCAS, ICME, and the ACM Multimedia, and also served on the advisory and technical committees for numerous professional organizations. He served or serves on the Editorial Board for several journals, including TCSVT, TMM, TIP, and TAMD.

References is not available for this document.

Towards Fine-Grained Human Pose Transfer With Detail Replenishing Network

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Towards Fine-Grained Human Pose Transfer With Detail Replenishing Network

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?