Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding

Merkle, Philipp; Winken, Martin; Pfaff, Jonathan; Schwarz, Heiko; Marpe, Detlev; Wiegand, Thomas

doi:10.1109/TIP.2024.3446228

Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding

Haupttitel:

Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding

Autor*in:

Merkle, Philipp; Winken, Martin; Pfaff, Jonathan; Schwarz, Heiko; Marpe, Detlev; Wiegand, Thomas

Erscheinungsjahr:

2024

Datum der Freigabe:

2024-10-15T12:53:27Z

Abstract:

This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.

Identifier:

https://refubium.fu-berlin.de/handle/fub188/45277
http://dx.doi.org/10.17169/refubium-44989

Teil des Identifiers:

e-ISSN (online): 1941-0042

Sprache:

Englisch

Freie Schlagwörter:

Inter prediction
convolutional neural network
intra reference samples
versatile video coding standard

DDC-Klassifikation:

004 Datenverarbeitung; Informatik

Publikationstyp:

Wissenschaftlicher Artikel