dc.contributor.author
Chang, Crystal T.
dc.contributor.author
Farah, Hodan
dc.contributor.author
Gui, Haiwen
dc.contributor.author
Rezaei, Shawheen Justin
dc.contributor.author
Bou-Khalil, Charbel
dc.contributor.author
Park, Ye-Jean
dc.contributor.author
Swaminathan, Akshay
dc.contributor.author
Omiye, Jesutofunmi A.
dc.contributor.author
Kolluri, Akaash
dc.contributor.author
Behr, Solveig
dc.date.accessioned
2025-04-17T05:09:10Z
dc.date.available
2025-04-17T05:09:10Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/47412
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-47130
dc.description.abstract
Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.
en
dc.format.extent
10 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Computer science
en
dc.subject
Medical ethics
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Red teaming ChatGPT in medicine to yield real-world insights on model behavior
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
149
dcterms.bibliographicCitation.doi
10.1038/s41746-025-01542-0
dcterms.bibliographicCitation.journaltitle
npj Digital Medicine
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.volume
8
dcterms.bibliographicCitation.url
https://doi.org/10.1038/s41746-025-01542-0
refubium.affiliation
Erziehungswissenschaft und Psychologie
refubium.affiliation.other
Arbeitsbereich Klinisch-Psychologische Intervention

refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2398-6352
refubium.resourceType.provider
WoS-Alert