Disney Research and the University of California, Irvine are showing off a new artificial intelligence-enhanced video compression model that’s in the early stages of development.
The new compression model has apparently yielded less distortion and smaller bits-per-pixel rates than current coding-decoding algorithms such as H.265 when trained on specialized video content. The researchers said they achieved comparable results on downscaled YouTube videos.
“Ultimately, every video compression approach works on a trade-off,” said research team leader Stephan Mandt, a UCI assistant professor of computer science who began the project while employed at Disney Research. “If I’m allowing for larger file sizes, then I can have better image quality. If I want to have a short, really small file size, then I have to tolerate some errors. The hope is that our neural network-based approach does a better trade-off overall between file size and quality.”
Mandt said the video compression still relies heavily on predictive capabilities. Current compression algorithms achieve this through engineered solutions like trying to compute the linear displacement of small, localized patches relative to their position on the previous frame.
“In contrast, deep neural networks take a datacentric approach and learn the video’s underlying dynamics by drawing on large datasets of video material,” wrote UCI in a news release. “These data-driven methods, enabled by advances in deep learning over the past decade, show promise for shrinking video file sizes in future generations of video compression codecs.”
The UCI/Disney Research team’s compression process first involves downscaling the dimensions of the video using a “variational autoencoder,” or a neural network that processes each video frame in a sequence of actions that results in a condensed array of numbers. The autoencoder then tries to undo this operation to ensure that the array contains enough information to restore the video frame. Then the algorithm attempts to guess the next compressed version of an image given what has gone before, relying on an AI-based technique called a “deep generative model.”
Mandt said this approach is not necessarily unique but what sets apart the Disney/UCI research is that the algorithm conducts an operation to encode frame content by rounding the autoencoder’s real-valued array to integers, which are easier to store than real numbers. Finally, the model applies lossless compression to the array, allowing for its exact restoration. During this process, the neural network informs the algorithm about which video frame to expect next, making the lossless compression aspect more efficient.
“The real contribution here was to combine this neural network-based deep generative video prediction model with everything else that belongs to compression algorithms, such as rounding and model-based lossless compression,” said Mandt. “Because the receiver requires a trained neural network for reconstructing the video, you might also have to think about how you transmit it along with the data. There are lots of open questions still. It’s a very early stage.”