Video Prediction Using Multi-Scale Deep Neural Networks

N., Shayanfar; V., Derhami; M., Rezaeian

Video Prediction Using Multi-Scale Deep Neural Networks

عنوان مقاله: Video Prediction Using Multi-Scale Deep Neural Networks
شناسه ملی مقاله: JR_JADM-10-3_011
منتشر شده در در سال 1401

مشخصات نویسندگان مقاله:

N. Shayanfar - Computer engineering department, Yazd University, Yazd, Iran.
V. Derhami - Computer engineering department, Yazd University, Yazd, Iran.
M. Rezaeian - Computer engineering department, Yazd University, Yazd, Iran.

خلاصه مقاله:

In video prediction it is expected to predict next frame of video by providing a sequence of input frames. Whereas numerous studies exist that tackle frame prediction, suitable performance is not still achieved and therefore the application is an open problem. In this article multiscale processing is studied for video prediction and a new network architecture for multiscale processing is presented. This architecture is in the broad family of autoencoders. It is comprised of an encoder and decoder. A pretrained VGG is used as an encoder that processes a pyramid of input frames at multiple scales simultaneously. The decoder is based on ۳D convolutional neurons. The presented architecture is studied by using three different datasets with varying degree of difficulty. In addition, the proposed approach is compared to two conventional autoencoders. It is observed that by using the pretrained network and multiscale processing results in a performant approach.

کلمات کلیدی:

deep learning, Convolutional autoencoder, Video prediction, multiscale processing

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1525744/