dc.contributor.author
Ammann, Clemens
dc.contributor.author
Hadler, Thomas
dc.contributor.author
Gröschel, Jan
dc.contributor.author
Kolbitsch, Christoph
dc.contributor.author
Schulz-Menger, Jeanette
dc.date.accessioned
2023-09-11T15:47:51Z
dc.date.available
2023-09-11T15:47:51Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/40826
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-40547
dc.description.abstract
Background: Cardiac function quantification in cardiovascular magnetic resonance requires precise contouring of the heart chambers. This time-consuming task is increasingly being addressed by a plethora of ever more complex deep learning methods. However, only a small fraction of these have made their way from academia into clinical practice. In the quality assessment and control of medical artificial intelligence, the opaque reasoning and associated distinctive errors of neural networks meet an extraordinarily low tolerance for failure.
Aim: The aim of this study is a multilevel analysis and comparison of the performance of three popular convolutional neural network (CNN) models for cardiac function quantification.
Methods: U-Net, FCN, and MultiResUNet were trained for the segmentation of the left and right ventricles on short-axis cine images of 119 patients from clinical routine. The training pipeline and hyperparameters were kept constant to isolate the influence of network architecture. CNN performance was evaluated against expert segmentations for 29 test cases on contour level and in terms of quantitative clinical parameters. Multilevel analysis included breakdown of results by slice position, as well as visualization of segmentation deviations and linkage of volume differences to segmentation metrics via correlation plots for qualitative analysis.
Results: All models showed strong correlation to the expert with respect to quantitative clinical parameters (r(z)(') = 0.978, 0.977, 0.978 for U-Net, FCN, MultiResUNet respectively). The MultiResUNet significantly underestimated ventricular volumes and left ventricular myocardial mass. Segmentation difficulties and failures clustered in basal and apical slices for all CNNs, with the largest volume differences in the basal slices (mean absolute error per slice: 4.2 +/- 4.5 ml for basal, 0.9 +/- 1.3 ml for midventricular, 0.9 +/- 0.9 ml for apical slices). Results for the right ventricle had higher variance and more outliers compared to the left ventricle. Intraclass correlation for clinical parameters was excellent (>= 0.91) among the CNNs.
Conclusion: Modifications to CNN architecture were not critical to the quality of error for our dataset. Despite good overall agreement with the expert, errors accumulated in basal and apical slices for all models.
en
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
cardiovascular magnetic resonance
en
dc.subject
artificial intelligence
en
dc.subject
deep learning
en
dc.subject
cardiac image segmentation
en
dc.subject
cardiac function quantification
en
dc.subject
quality control
en
dc.subject.ddc
600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und Gesundheit
dc.title
Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
1118499
dcterms.bibliographicCitation.doi
10.3389/fcvm.2023.1118499
dcterms.bibliographicCitation.journaltitle
Frontiers in Cardiovascular Medicine
dcterms.bibliographicCitation.originalpublishername
Frontiers Media SA
dcterms.bibliographicCitation.volume
10
refubium.affiliation
Charité - Universitätsmedizin Berlin
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.bibliographicCitation.pmid
37144061
dcterms.isPartOf.eissn
2297-055X