Loading [MathJax]/extensions/MathMenu.js
VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution | IEEE Journals & Magazine | IEEE Xplore

VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution


The overview of our VSRDiff: A novel DM-based method for reconstruction-oriented VSR, enhancing perceptual quality while ensuring visual fidelity and temporal consistency...

Abstract:

Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (...Show More

Abstract:

Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at https://github.com/aigcvsr/VSRDiff.
The overview of our VSRDiff: A novel DM-based method for reconstruction-oriented VSR, enhancing perceptual quality while ensuring visual fidelity and temporal consistency...
Published in: IEEE Access ( Volume: 13)
Page(s): 11447 - 11462
Date of Publication: 14 January 2025
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.