dc.contributor.author
Kauffmann, Jacob
dc.contributor.author
Dippel, Jonas
dc.contributor.author
Ruff, Lukas
dc.contributor.author
Samek, Wojciech
dc.contributor.author
Müller, Klaus-Robert
dc.contributor.author
Montavon, Grégoire
dc.date.accessioned
2025-04-11T12:51:02Z
dc.date.available
2025-04-11T12:51:02Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/47344
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-47062
dc.description.abstract
Unsupervised learning has become an essential building block of artifical intelligence systems. The representations it produces, for example, in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions on the available data but also that these accurate predictions do not arise from a Clever Hans (CH) effect. Here, using specially developed explainable artifical intelligence techniques and applying them to popular representation learning and anomaly detection models for image data, we show that CH effects are widespread in unsupervised learning. In particular, through use cases on medical and industrial inspection data, we demonstrate that CH effects systematically lead to significant performance loss of downstream models under plausible dataset shifts or reweighting of different data subgroups. Our empirical findings are enriched by theoretical insights, which point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to systematically mitigate CH effects, thereby making unsupervised learning more robust.
en
dc.format.extent
14 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Computer science
en
dc.subject
Scientific data
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Explainable AI reveals Clever Hans effects in unsupervised learning models
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.1038/s42256-025-01000-2
dcterms.bibliographicCitation.journaltitle
Nature Machine Intelligence
dcterms.bibliographicCitation.number
3
dcterms.bibliographicCitation.pagestart
412
dcterms.bibliographicCitation.pageend
422
dcterms.bibliographicCitation.volume
7
dcterms.bibliographicCitation.url
https://doi.org/10.1038/s42256-025-01000-2
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2522-5839
refubium.resourceType.provider
WoS-Alert