Project Details
Projekt Print View

Conditional Coding for Learned Image and Video Compression

Subject Area Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 508272532
 
This joint research project between the Institut für Informationsverarbeitung (TNT) of the Leibniz Universität Hannover (LUH) and the Department of Computer Science of the National Chiao Tung University (NYCU) in Taiwan addresses end-to-end learned video compression from the perspective of conditional coding with an meta learning-based regularization and tailoring scheme.The arrival of deep learning spurs a new wave of developments in end-to-end learned compression. Recent years witnessed the success of learned image compression, with the state-of-the-art showing better MS-SSIM results than (and comparable PSNR results to) VVC Intra. By comparison, the development of end-to-end learned video compression is still in its early stage. Most learned video codecs follow the traditional, hybrid-based coding architecture, namely temporal prediction followed by transform-based residual coding. A recent publication indicates that although the state-of-the-art learned video codecs show better results than x265, they can hardly compete with the HEVC Test Model (HM) under more realistic test conditions.Recently, a new school of thought, known as inter-frame conditional coding, emerged, taking end-to-end learned video coding to a new level of compression performance. The idea of conditional coding is to learn the data distribution of a coding frame conditioned on useful contextual information, in order to reach a lower conditional entropy rate for better compression.The emergence of deep generative models, such as variational autoencoders (VAE) and normalizing flow models, opens up new opportunities for a paradigm shift in learning-based compression. Currently, VAE is a popular choice for the compression backbone. Representing a new attempt, this joint research proposal introduces a special type of normalizing flow model, called augmented normalizing flows (ANF), for conditional coding. We choose ANF because it is shown to achieve superior expressiveness to VAE and includes VAE as a special case.Another notable aspect of this joint research project is to address the generalizability and adaptability of the learned video codecs. The learned codecs often suffer from the domain gap between the training and the test data; that is, they may not generalize well on unseen data. In a more general sense, they can hardly achieve optimal compression for individual test images/videos, each of which can in fact be considered a distinct domain. To improve the generalizability, this proposal shall incorporate Noether’s theorem in the form of meta learning to learn an inductive bias that encourages decoded video frames to conserve certain latent consistency in the temporal dimension. We shall also use this learned inductive bias to adapt the encoder and/or the decoder at inference time to suit individual videos. Due to its unsupervised nature, our approach has the striking feature of not having to signal any additional information in the bitstream.
DFG Programme Research Grants
International Connection Taiwan
Cooperation Partner Professor Wen-Hsiao Peng, Ph.D.
 
 

Additional Information

Textvergrößerung und Kontrastanpassung