Multimodal Inputs Driven Talking Face Generation With Spatial–Temporal Dependency | IEEE Journals & Magazine | IEEE Xplore

- Donate
- Cart
- Create Account
- Personal Sign In

ADVANCED SEARCH

Journals & Magazines >IEEE Transactions on Circuits... >Volume: 31 Issue: 1

Multimodal Inputs Driven Talking Face Generation With Spatial–Temporal Dependency

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Given an arbitrary speech clip or text information as input, the proposed work aims to generate a talking face video with accurate lip synchronization. Existing works mai...Show More

Metadata

Abstract:

Given an arbitrary speech clip or text information as input, the proposed work aims to generate a talking face video with accurate lip synchronization. Existing works mainly have three limitations. (1) A single-modal learning is adopted with either audio or text as input, hence it lacks the complementarity of multimodal inputs. (2) Each frame is generated independently, hence it ignores the temporal dependency between consecutive frames. (3) Each face image is generated by the traditional convolution neural network (CNN) with a local receptive field, hence it cannot effectively capture the spatial dependency within internal representations of face images. To overcome these problems above, we decompose the talking face generation task into two steps: mouth landmarks prediction and video synthesis. First, a multimodal learning method is proposed to generate accurate mouth landmarks with multimedia inputs (both text and audio). Second, a network named Face2Vid is proposed to generate video frames conditioned on the predicted mouth landmarks. In Face2Vid, the optical flow is employed to model the temporal dependency between frames, meanwhile, a self-attention mechanism is introduced to model the spatial dependency across image regions. Extensive experiments demonstrate that our approach can generate photo-realistic video frames with the background, and exhibit the superiorities on accurate synchronization of lip movements and smooth transition of facial movements.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 31, Issue: 1, January 2021)

Page(s): 203 - 216

Date of Publication: 12 February 2020

ISSN Information:

DOI: 10.1109/TCSVT.2020.2973374

Funding Agency:

Author image of Lingyun Yu

Department of Automation, University of Science and Technology of China, Hefei, China

Lingyun Yu received the B.S. degree in electrical engineering and automation from the China University of Mining and Technology, in 2015. She is currently pursuing the Ph.D. degree with the University of Science and Technology of China. Her research interests cover talking face generation, multimodal learning, articulatory movements-driven 3D talking head, human–computer interaction, and video synthesis.

Lingyun Yu received the B.S. degree in electrical engineering and automation from the China University of Mining and Technology, in 2015. She is currently pursuing the Ph.D. degree with the University of Science and Technology of China. Her research interests cover talking face generation, multimodal learning, articulatory movements-driven 3D talking head, human–computer interaction, and video synthesis.View more

Author image of Jun Yu

Department of Automation, University of Science and Technology of China, Hefei, China

Jun Yu (Member, IEEE) is currently an Associate Professor with the University of Science and Technology of China, Hefei, China. He has authored or coauthored more than 100 articles. His research interests include human–computer interaction and intelligent robots. He is a member of the Technical Committee-Biological Information and Artificial Life, Chinese Association for Artificial Intelligence. He received the Best Paper...Show More

Jun Yu (Member, IEEE) is currently an Associate Professor with the University of Science and Technology of China, Hefei, China. He has authored or coauthored more than 100 articles. His research interests include human–computer interaction and intelligent robots. He is a member of the Technical Committee-Biological Information and Artificial Life, Chinese Association for Artificial Intelligence. He received the Best Paper...View more

Author image of Mengyan Li

Department of Automation, University of Science and Technology of China, Hefei, China

Mengyan Li received the B.E. degree in electrical engineering from the Hefei University of Technology in 2017. He is currently pursuing the M.S. degree in automation with the University of Science and Technology of China. His research interests include computer vision, deep learning applied to image super-resolution and image retrieval, and face analysis techniques.

Mengyan Li received the B.E. degree in electrical engineering from the Hefei University of Technology in 2017. He is currently pursuing the M.S. degree in automation with the University of Science and Technology of China. His research interests include computer vision, deep learning applied to image super-resolution and image retrieval, and face analysis techniques.View more

Author image of Qiang Ling

Department of Automation, University of Science and Technology of China, Hefei, China

Qiang Ling (Senior Member, IEEE) received the B.S. degree from the University of Science and Technology of China, Hefei, China, in 1997, the M.E. degree from Tsinghua University, Beijing, China, in 2000, and the Ph.D. degree from the University of Notre Dame, Notre Dame, IN, USA, in 2005. He is currently a Professor with the Department of Automation, University of Science and Technology of China. He worked as a Research S...Show More

Qiang Ling (Senior Member, IEEE) received the B.S. degree from the University of Science and Technology of China, Hefei, China, in 1997, the M.E. degree from Tsinghua University, Beijing, China, in 2000, and the Ph.D. degree from the University of Notre Dame, Notre Dame, IN, USA, in 2005. He is currently a Professor with the Department of Automation, University of Science and Technology of China. He worked as a Research S...View more

Author image of Lingyun Yu

Department of Automation, University of Science and Technology of China, Hefei, China

Lingyun Yu received the B.S. degree in electrical engineering and automation from the China University of Mining and Technology, in 2015. She is currently pursuing the Ph.D. degree with the University of Science and Technology of China. Her research interests cover talking face generation, multimodal learning, articulatory movements-driven 3D talking head, human–computer interaction, and video synthesis.

Lingyun Yu received the B.S. degree in electrical engineering and automation from the China University of Mining and Technology, in 2015. She is currently pursuing the Ph.D. degree with the University of Science and Technology of China. Her research interests cover talking face generation, multimodal learning, articulatory movements-driven 3D talking head, human–computer interaction, and video synthesis.View more

Author image of Jun Yu

Department of Automation, University of Science and Technology of China, Hefei, China

Jun Yu (Member, IEEE) is currently an Associate Professor with the University of Science and Technology of China, Hefei, China. He has authored or coauthored more than 100 articles. His research interests include human–computer interaction and intelligent robots. He is a member of the Technical Committee-Biological Information and Artificial Life, Chinese Association for Artificial Intelligence. He received the Best Paper Finalist from ICME 2017 and the Great Challenge Champion from PRCV 2018.

Jun Yu (Member, IEEE) is currently an Associate Professor with the University of Science and Technology of China, Hefei, China. He has authored or coauthored more than 100 articles. His research interests include human–computer interaction and intelligent robots. He is a member of the Technical Committee-Biological Information and Artificial Life, Chinese Association for Artificial Intelligence. He received the Best Paper Finalist from ICME 2017 and the Great Challenge Champion from PRCV 2018.View more

Author image of Mengyan Li

Department of Automation, University of Science and Technology of China, Hefei, China

Mengyan Li received the B.E. degree in electrical engineering from the Hefei University of Technology in 2017. He is currently pursuing the M.S. degree in automation with the University of Science and Technology of China. His research interests include computer vision, deep learning applied to image super-resolution and image retrieval, and face analysis techniques.

Mengyan Li received the B.E. degree in electrical engineering from the Hefei University of Technology in 2017. He is currently pursuing the M.S. degree in automation with the University of Science and Technology of China. His research interests include computer vision, deep learning applied to image super-resolution and image retrieval, and face analysis techniques.View more

Author image of Qiang Ling

Department of Automation, University of Science and Technology of China, Hefei, China

Qiang Ling (Senior Member, IEEE) received the B.S. degree from the University of Science and Technology of China, Hefei, China, in 1997, the M.E. degree from Tsinghua University, Beijing, China, in 2000, and the Ph.D. degree from the University of Notre Dame, Notre Dame, IN, USA, in 2005. He is currently a Professor with the Department of Automation, University of Science and Technology of China. He worked as a Research Staff Member with Seagate Technology from 2005 to 2008. He joined the University of Science and Technology of China in 2008. His research interests include networked control systems and signal processing. He is currently serving as an Associate Editor for the IEEE Control Systems Society Conference Editorial Board.

Qiang Ling (Senior Member, IEEE) received the B.S. degree from the University of Science and Technology of China, Hefei, China, in 1997, the M.E. degree from Tsinghua University, Beijing, China, in 2000, and the Ph.D. degree from the University of Notre Dame, Notre Dame, IN, USA, in 2005. He is currently a Professor with the Department of Automation, University of Science and Technology of China. He worked as a Research Staff Member with Seagate Technology from 2005 to 2008. He joined the University of Science and Technology of China in 2008. His research interests include networked control systems and signal processing. He is currently serving as an Associate Editor for the IEEE Control Systems Society Conference Editorial Board.View more

References is not available for this document.