Learning-Based Multi-Frame Video Quality Enhancement
This paper was presented by Junchao Tong, Xilin Wu, Dandan Ding, Zheng Zhu, and Zoe Liu, “Learning-Based Multi-Frame Video Quality Enhancement,” in the Proceedings of the IEEE International Conference on Image Processing (ICIP), September 22-25, 2019 in Taipei, Taiwan.
The convolution neural network (CNN) has shown great success in video quality enhancement. Existing methods mainly conduct enhancement tasks in the spatial domain, exploring the pixel correlations within one frame. Taking advantage of the similarity across successive frames, this paper demonstrates a learning-based multi-frame approach, with an aim to explore the greatest potential for video quality enhancement while leveraging the temporal correlation.
High level overview of how LMVE works:
First, we apply a learning-based optical flow to compensate for the temporal motion across neighboring frames. LMVE is a novel approach to jointly leverage the spatial-temporal correlations among frames for better enhancement on compressed video. LMVE categorizes different frames within one video to three quality levels, and utilizes those high-quality and moderate-quality frames to enhance the low-quality ones in between.
Afterward, a deep CNN network, which is structured in an early-fusion manner, discovers the joint spatial-temporal correlations within the video. FlowNet is first adopted to obtain the optical flow between adjacent frames in order to generate compensated frames. Afterwards, the compensated frames are fed into an early-fusion CNN network, in conjunction with the original low-quality frames.
To ensure the generality of our CNN model, we further propose a robust training strategy. One high-quality frame and one moderate-quality frame are paired to enhance the remaining low-quality frames in between, which considers a trade-off between frame distances and various frame quality.
Experimental results demonstrate that LMVE obtains a consistent superior result and outperforms prior work by 0.23 dB in PSNR on average.
The code and model of this LMVE approach are published in Github at https://github.com/IVC-Projects/LMVE.
Dr. Xuhui Shao
Managing Partner, Foothill Ventures
I invested in Visionular because the team is at the forefront of innovations in video encoding and image processing for real-time low latency video communications and premium video streaming applications.
Co-Founder & CEO, Agora.io
Click button to start an evaluation.