dc.contributor.author
Linhardt, Lorenz
dc.contributor.author
Müller, Klaus-Robert
dc.contributor.author
Montavon, Grégoire
dc.date.accessioned
2024-01-19T08:32:41Z
dc.date.available
2024-01-19T08:32:41Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/42106
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-41831
dc.description.abstract
Robustness has become an important consideration in deep learning. With the help of explainable AI, mismatches between an explained model’s decision strategy and the user’s domain knowledge (e.g. Clever Hans effects) have been identified as a starting point for improving faulty models. However, it is less clear what to do when the user and the explanation agree. In this paper, we demonstrate that acceptance of explanations by the user is not a guarantee for a machine learning model to be robust against Clever Hans effects, which may remain undetected. Such hidden flaws of the model can nevertheless be mitigated, and we demonstrate this by contributing a new method, Explanation-Guided Exposure Minimization (EGEM), that preemptively prunes variations in the ML model that have not been the subject of positive explanation feedback. Experiments demonstrate that our approach leads to models that strongly reduce their reliance on hidden Clever Hans strategies, and consequently achieve higher accuracy on new data.
en
dc.format.extent
14 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Clever Hans effect
en
dc.subject
Model refinement
en
dc.subject
Explainable AI
en
dc.subject
Deep neural networks
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Preemptively pruning Clever-Hans strategies in deep neural networks
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
102094
dcterms.bibliographicCitation.doi
10.1016/j.inffus.2023.102094
dcterms.bibliographicCitation.journaltitle
Information Fusion
dcterms.bibliographicCitation.volume
103
dcterms.bibliographicCitation.url
https://doi.org/10.1016/j.inffus.2023.102094
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1872-6305
refubium.resourceType.provider
WoS-Alert