dc.contributor.author
Merkle, Philipp
dc.contributor.author
Winken, Martin
dc.contributor.author
Pfaff, Jonathan
dc.contributor.author
Schwarz, Heiko
dc.contributor.author
Marpe, Detlev
dc.contributor.author
Wiegand, Thomas
dc.date.accessioned
2024-10-15T12:53:27Z
dc.date.available
2024-10-15T12:53:27Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/45277
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-44989
dc.description.abstract
This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.
en
dc.format.extent
15 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Inter prediction
en
dc.subject
convolutional neural network
en
dc.subject
intra reference samples
en
dc.subject
versatile video coding standard
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.1109/TIP.2024.3446228
dcterms.bibliographicCitation.journaltitle
IEEE Transactions on Image Processing
dcterms.bibliographicCitation.pagestart
4738
dcterms.bibliographicCitation.pageend
4752
dcterms.bibliographicCitation.volume
33
dcterms.bibliographicCitation.url
https://doi.org/10.1109/TIP.2024.3446228
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1941-0042
refubium.resourceType.provider
WoS-Alert