dc.contributor.author
Schäfer, Michael
dc.date.accessioned
2025-12-05T08:38:12Z
dc.date.available
2025-12-05T08:38:12Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/50534
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-50261
dc.description.abstract
Transform coding methods play a fundamental role in image and video coding technologies like the Versatile Video Coding (VVC) standard. Typically, the employed transforms are linear maps with strong energy compaction capabilities. Therefore, efficient quantization and entropy coding methods can be designed for transmitting and storing the transform coefficients. In recent years, there have been considerable efforts to design coding-efficient transforms from learning-based methods. As for video compression, additional bitrate savings are achieved by optimizing linear block transforms with respect to the different intra prediction modes. In contrast, end-to-end optimized image codecs have been obtained from deep-learning experiments. Learned codecs like the JPEG AI coding standard rely on using non-linear, neural networks as forward and inverse transform. Remarkably, JPEG AI is reported to have superior compression efficiency relative to conventional still image coding. Since the transforms in learned image compression are non-linear, it is not clear if rate-distortion optimization methods designed for linear blocks transforms are well-suited. Thus, this thesis studies the impact of different signal-dependent encoder optimizations on the quantization when a learned image codec is used. As a main result, an algorithm for rate-distortion optimized scalar quantization is developed which achieves bitrate savings between 1 % and 7 %. Furthermore, it has been shown that a rate-constrained vector quantizer improves the coding efficiency on a similar scale. Its design has similarities with the trellis-coded quantization stage in VVC. Thus, since rate-constrained quantization is shown to be effective when applied to non-linear transforms, different non-linear transform coding tools for block-based video compression are developed. These tools employ neural networks which are obtained from a data-driven optimization. The first tool, a non-linear coefficient prediction, uses reconstructed coefficients and the reference samples from the block boundary for predicting low-frequency coefficients. Therefore, only the difference between the predicted value and the original coefficient is quantized and coded. The second tool, a non-linear transform offset, is applied after reconstructing all coefficients and also depends on the reference samples as input. The offset is added before the synthesis transform and has been trained to improve the reconstruction quality. A combination of both methods yields coding gains between 1.0 % and 2.8 % over VVC in All-Intra configuration. Finally, non-linear transforms and intra modes are obtained from an end-to-end training method. The learned transforms do not depend on the reference samples. The training goal is to minimize the expected rate-distortion cost by using an approximation of the transform coefficients’ bitrate. The average All-intra bitrate savings of the learned transforms and intra modes are 0.9 % against VVC.
de
dc.format.extent
xxiv, 196 Seiten
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
dc.subject
video coding
en
dc.subject
transform coding
en
dc.subject
image coding
en
dc.subject
neural networks
en
dc.subject
machine learning
en
dc.subject
versatile video coding
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Non-linear Data-driven Transforms for Visual Data Compression
dc.contributor.gender
male
dc.contributor.firstReferee
Schwarz, Heiko
dc.contributor.furtherReferee
Göhring, Daniel
dc.contributor.furtherReferee
Ballé, Jona
dc.date.accepted
2025-11-24
dc.identifier.urn
urn:nbn:de:kobv:188-refubium-50534-3
dc.title.translated
Nichtlineare Datengetriebene Transformationen zur Kompression Visueller Daten
ger
refubium.affiliation
Mathematik und Informatik
dcterms.accessRights.dnb
free
dcterms.accessRights.openaire
open access