Diffusion Probabilistic Modeling for Video Generation

Ruihan Yang; Prakhar Srivastava; Stephan Mandt

doi:10.3390/e25101469

Diffusion Probabilistic Modeling for Video Generation

Entropy (Basel). 2023 Oct 20;25(10):1469. doi: 10.3390/e25101469.

Authors

Ruihan Yang¹, Prakhar Srivastava¹, Stephan Mandt¹

Affiliation

¹ Department of Computer Science, University of California, Irvine, CA 92697, USA.

Abstract

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural video compression. The model successively generates future frames by correcting a deterministic next-frame prediction using a stochastic residual generated by an inverse diffusion process. We compare this approach against six baselines on four datasets involving natural and simulation-based videos. We find significant improvements in terms of perceptual quality and probabilistic frame forecasting ability for all datasets.

Keywords: autoregressive models; deep generative models; diffusion models; video generation.

Abstract

Grants and funding