Loading [MathJax]/extensions/TeX/mhchem.js
A Review of Video Generation Approaches | IEEE Conference Publication | IEEE Xplore

A Review of Video Generation Approaches


Abstract:

Generating videos from some initial frames is an appealing field of research in deep learning. There exists an ever expanding foray of approaches to generate long-range a...Show More

Abstract:

Generating videos from some initial frames is an appealing field of research in deep learning. There exists an ever expanding foray of approaches to generate long-range and realistic video frame series. Generating videos can help predict trajectories and even model object movements, to enhance autonomous robots. However, there are only a few comprehensive studies that review various approaches on the basis of their relative advantages, disadvantages, and evolution. Hence, this paper presents a detailed overview of Deep Learning based approaches employed to tackle the complex problem of video generation. The approaches involve Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs) and even the Transformer model. Finally, the performance of all the approaches are examined and compared on the BAIR Robot Pushing dataset.
Date of Conference: 17-19 December 2020
Date Added to IEEE Xplore: 01 March 2021
ISBN Information:
Conference Location: Thrissur, India
References is not available for this document.

I. Introduction

Video generation has gained substantial traction as a research topic. Researchers all over the world are working on many different methods to generate long range, high resolution, indistinguishably natural videos. The popularity is due to many critical advantages attached to it. Prominent fields that have great relevance to video generation are reinforcement learning [1] and motion planning of autonomous systems [2]. Although it is an interesting task, it is a very difficult problem to model, owing to its multi-modal nature and the exponentially growing tree of possibilities after the passage of initial frames. Video generation has numerous real-world applications which include interpolation of full sized videos from cropped videos and scene generation in video games. Further, it can also be utilized for generating videos which can serve as datasets for machine learning models. In robotics, video generation can play a vital role in handling occlusion, predicting object movement [3] and trajectory optimization.

Select All
1.
Yuxi Li. Deep reinforcement learning: An overview, 2018.
2.
Tianyu Gu and John M. Dolan. On-road motion planning for autonomous vehicles, 2012.
3.
Khush Agrawal and Rohit Lal. Person following mobile robot using multiplexed detection and tracking. In Vilas R. Kalamkar and Katarina Monkova, editors, Advances in Mechanical Engineering, pages 815 - 822, Singapore, 2021. Springer Singapore. ISBN 978-981-15-3639-7.
4.
Chelsea Finn, Ian Goodfellow, and Sergey Levine. Unsupervised learning for physical interaction through video prediction, 2016.
5.
Emily Denton and Rob Fergus. Stochastic video generation with a learned prior, 2018.
6.
Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2014.
7.
Manoj Kumar, Mohammad Babaeizadeh, Dumitru Erhan, Chelsea Finn, Sergey Levine, Laurent Dinh, and Durk Kingma. Videoflow: A conditional flow-based model for stochastic video generation, 2020.
8.
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
9.
Alex X. Lee Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn, and Sergey Levine, Stochastic adversarial video prediction, 2018.
10.
Ruslan Rakhimov, Denis Volkhonskiy, Alexey Artemov, Denis Zorin, and Evgeny Burnaev. Latent video transformer, 2020.
11.
Aidan Clark, Jeff Donahue, and Karen Simonyan. Adversarial video generation on complex datasets, 2019.
12.
Pauline Luc, Aidan Clark, Sander Dieleman, Diego de Las Casas, Yotam Doron, Albin Cassirer, and Karen Simonyan. Transformation-based adversarial video prediction on large-scale data.
13.
Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, and Tim Salimans. Axial attention in multidimensional transformers, 2019.
14.
Dirk Weissenborn, Oscar Täckström, and Jakob Uszkoreit. Scaling autoregressive video models, 2020.
15.
Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error, 2016.
16.
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. Unsupervised learning of video representations using lstms, 2016.
17.
Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh. Action-conditional video prediction using deep networks in atari games, 2015.
18.
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. Anticipating visual representations from unlabeled video, 2016.
19.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput., 9 ( 8 ): 1735 - 1780, 1997. ISSN 0899-7667. doi: 10.1162/neco.1997.9.8.1735.
20.
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 ( 7 ): 1325 - 1339, jul 2014.
21.
James M. Joyce Kullback-Leibler Divergence, pages 720 - 722. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. ISBN 978-3-642-04898-2. doi: 10.1007/978-3-642-04898-2_327. URL https://doi.org/10.1007/978-3-642-04898-2_327.
22.
Murtaza Dalal, Alexander C. Li, and Rohan Taori. Autoregressive models: What are they good for?, 2019.
23.
Laurent Dinh, David Krueger, and Yoshua Bengio. Nice: Non-linear independent components estimation, 2015.
24.
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real nvp, 2017.
25.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017.
26.
Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, and Jifeng Dai. An empirical study of spatial attention mechanisms in deep networks, 2019.
27.
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional image generation with pixelcnn decoders, 2016.
28.
Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications.
29.
Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, and Pieter Abbeel. Pixelsnail: An improved autoregressive generative model, 2017.
30.
Jacob Menick and Nal Kalchbrenner. Generating high fidelity images with subscale pixel networks and multi-dimensional upscaling, 2018.

Contact IEEE to Subscribe

References

References is not available for this document.