Abstract:
Given a video sequence, video frame interpolation aims to synthesize an in-between frame of two consecutive frames. In this paper, we propose a multi-scale position featu...Show MoreMetadata
Abstract:
Given a video sequence, video frame interpolation aims to synthesize an in-between frame of two consecutive frames. In this paper, we propose a multi-scale position feature transform (MS-PFT) network for video frame interpolation where two parallel prediction networks and one optimization network are designed to predict the features of target frame and generate the final interpolation result, respectively. To increase the fidelity of the synthesised frames, we propose to apply a position feature transform (PFT) layer in the residual blocks of the prediction networks to estimate scaling factors which help evaluate different degrees of the importance of deep features around a target pixel. A PFT layer utilizes optical flow to extract and generate position features and then adjusts the learning process of our model. We further extend our model into a multi-scale structure in which each scale of the network shares the same parameters to maximise the efficiency of our network with model size unchanged. The experiments show that our method can handle the challenging scenarios like occlusion and large motion effectively and the proposed method outperforms those state-of-the-art approaches on different datasets.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 30, Issue: 11, November 2020)