Non-linear Data-driven Transforms for Visual Data Compression

Schäfer, Michael

Non-linear Data-driven Transforms for Visual Data Compression

Metadaten

dc.contributor.author

Schäfer, Michael

dc.date.accessioned

2025-12-05T08:38:12Z

dc.date.available

2025-12-05T08:38:12Z

dc.date.issued

2025

dc.identifier.uri

https://refubium.fu-berlin.de/handle/fub188/50534

dc.identifier.uri

http://dx.doi.org/10.17169/refubium-50261

dc.description.abstract

Transform coding methods play a fundamental role in image and video coding technologies like the Versatile Video Coding (VVC) standard. Typically, the employed transforms are linear maps with strong energy compaction capabilities. Therefore, efficient quantization and entropy coding methods can be designed for transmitting and storing the transform coefficients. In recent years, there have been considerable efforts to design coding-efficient transforms from learning-based methods. As for video compression, additional bitrate savings are achieved by optimizing linear block transforms with respect to the different intra prediction modes. In contrast, end-to-end optimized image codecs have been obtained from deep-learning experiments. Learned codecs like the JPEG AI coding standard rely on using non-linear, neural networks as forward and inverse transform. Remarkably, JPEG AI is reported to have superior compression efficiency relative to conventional still image coding. Since the transforms in learned image compression are non-linear, it is not clear if rate-distortion optimization methods designed for linear blocks transforms are well-suited. Thus, this thesis studies the impact of different signal-dependent encoder optimizations on the quantization when a learned image codec is used. As a main result, an algorithm for rate-distortion optimized scalar quantization is developed which achieves bitrate savings between 1 % and 7 %. Furthermore, it has been shown that a rate-constrained vector quantizer improves the coding efficiency on a similar scale. Its design has similarities with the trellis-coded quantization stage in VVC. Thus, since rate-constrained quantization is shown to be effective when applied to non-linear transforms, different non-linear transform coding tools for block-based video compression are developed. These tools employ neural networks which are obtained from a data-driven optimization. The first tool, a non-linear coefficient prediction, uses reconstructed coefficients and the reference samples from the block boundary for predicting low-frequency coefficients. Therefore, only the difference between the predicted value and the original coefficient is quantized and coded. The second tool, a non-linear transform offset, is applied after reconstructing all coefficients and also depends on the reference samples as input. The offset is added before the synthesis transform and has been trained to improve the reconstruction quality. A combination of both methods yields coding gains between 1.0 % and 2.8 % over VVC in All-Intra configuration. Finally, non-linear transforms and intra modes are obtained from an end-to-end training method. The learned transforms do not depend on the reference samples. The training goal is to minimize the expected rate-distortion cost by using an approximation of the transform coefficients’ bitrate. The average All-intra bitrate savings of the learned transforms and intra modes are 0.9 % against VVC.

dc.format.extent

xxiv, 196 Seiten

dc.language

eng

dc.rights.uri

http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen

dc.subject

video coding

dc.subject

transform coding

dc.subject

image coding

dc.subject

neural networks

dc.subject

machine learning

dc.subject

versatile video coding

dc.subject.ddc

000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik

dc.title

Non-linear Data-driven Transforms for Visual Data Compression

dc.type

Dissertation

dcterms.format

Text

dc.contributor.gender

male

dc.contributor.firstReferee

Schwarz, Heiko

dc.contributor.furtherReferee

Göhring, Daniel

dc.contributor.furtherReferee

Ballé, Jona

dc.date.accepted

2025-11-24

dc.identifier.urn

urn:nbn:de:kobv:188-refubium-50534-3

dc.title.translated

Nichtlineare Datengetriebene Transformationen zur Kompression Visueller Daten

ger

refubium.affiliation

Mathematik und Informatik

dcterms.accessRights.dnb

free

dcterms.accessRights.openaire

open access

Zur Kurzanzeige

Das Dokument erscheint in:

Dissertationen FU

Dateien zu dieser Ressource

Schäfer_Michael_thesis.pdf

Größe: 10.10MB

Format: PDF

Prüfsumme (MD5): 5040ab00bca60807b68df75ebc8e93ec

Öffnen

Non-linear Data-driven Transforms for Visual Data Compression

Refubium - Repositorium der Freien Universität Berlin

Non-linear Data-driven Transforms for Visual Data Compression

Metadaten

Das Dokument erscheint in:

Dateien zu dieser Ressource

Metadaten exportieren