Non-linear Data-driven Transforms for Visual Data Compression

Schäfer, Michael

Non-linear Data-driven Transforms for Visual Data Compression

Haupttitel:

Non-linear Data-driven Transforms for Visual Data Compression

Titel übersetzt:

Nichtlineare Datengetriebene Transformationen zur Kompression Visueller Daten

Autor*in:

Schäfer, Michael

Erscheinungsjahr:

2025

Datum der Freigabe:

2025-12-05T08:38:12Z

Abstract:

Transform coding methods play a fundamental role in image and video coding technologies like the Versatile Video Coding (VVC) standard. Typically, the employed transforms are linear maps with strong energy compaction capabilities. Therefore, efficient quantization and entropy coding methods can be designed for transmitting and storing the transform coefficients. In recent years, there have been considerable efforts to design coding-efficient transforms from learning-based methods. As for video compression, additional bitrate savings are achieved by optimizing linear block transforms with respect to the different intra prediction modes. In contrast, end-to-end optimized image codecs have been obtained from deep-learning experiments. Learned codecs like the JPEG AI coding standard rely on using non-linear, neural networks as forward and inverse transform. Remarkably, JPEG AI is reported to have superior compression efficiency relative to conventional still image coding. Since the transforms in learned image compression are non-linear, it is not clear if rate-distortion optimization methods designed for linear blocks transforms are well-suited. Thus, this thesis studies the impact of different signal-dependent encoder optimizations on the quantization when a learned image codec is used. As a main result, an algorithm for rate-distortion optimized scalar quantization is developed which achieves bitrate savings between 1 % and 7 %. Furthermore, it has been shown that a rate-constrained vector quantizer improves the coding efficiency on a similar scale. Its design has similarities with the trellis-coded quantization stage in VVC. Thus, since rate-constrained quantization is shown to be effective when applied to non-linear transforms, different non-linear transform coding tools for block-based video compression are developed. These tools employ neural networks which are obtained from a data-driven optimization. The first tool, a non-linear coefficient prediction, uses reconstructed coefficients and the reference samples from the block boundary for predicting low-frequency coefficients. Therefore, only the difference between the predicted value and the original coefficient is quantized and coded. The second tool, a non-linear transform offset, is applied after reconstructing all coefficients and also depends on the reference samples as input. The offset is added before the synthesis transform and has been trained to improve the reconstruction quality. A combination of both methods yields coding gains between 1.0 % and 2.8 % over VVC in All-Intra configuration. Finally, non-linear transforms and intra modes are obtained from an end-to-end training method. The learned transforms do not depend on the reference samples. The training goal is to minimize the expected rate-distortion cost by using an approximation of the transform coefficients’ bitrate. The average All-intra bitrate savings of the learned transforms and intra modes are 0.9 % against VVC.

Identifier:

https://refubium.fu-berlin.de/handle/fub188/50534
http://dx.doi.org/10.17169/refubium-50261
urn:nbn:de:kobv:188-refubium-50534-3

Sprache:

Englisch

Freie Schlagwörter:

video coding
transform coding
image coding
neural networks
machine learning
versatile video coding

DDC-Klassifikation:

004 Datenverarbeitung; Informatik

Publikationstyp:

Dissertation

Fachbereich/Einrichtung:

Mathematik und Informatik