dc.contributor.author
Fillies, Jan
dc.contributor.author
Paschke, Adrian
dc.date.accessioned
2025-08-14T09:22:55Z
dc.date.available
2025-08-14T09:22:55Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/47159
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-46877
dc.description.abstract
With the increasing presence of adolescents and children online, it is crucial to evaluate algorithms designed to protect them from physical and mental harm. This study measures the bias introduced by emerging slurs found in youth language on existing BERT-based hate speech detection models. The research establishes a novel framework to identify language bias within trained networks, introducing a technique to detect emerging hate phrases and evaluate the unintended bias associated with them. As a result, three bias test sets are constructed: one for emerging hate speech terms, another for established hate terms, and one to test for overfitting. Based on these test sets, three scientific and one commercial hate speech detection models are assessed and compared. For comprehensive evaluation, the research introduces a novel Youth Language Bias Score. Finally, the study applies fine-tuning as a mitigation strategy for youth language bias, rigorously testing and evaluating the newly trained classifier. To summarize, the research introduces a novel framework for bias detection, highlights the influence of adolescent language on classifier performance in hate speech classification, and presents the first-ever hate speech classifier specifically trained for online youth language. This study focuses only on slurs in hateful speech, offering a foundational perspective for the field.
en
dc.format.extent
13 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Hate speech detection
en
dc.subject
Youth language
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Youth language and emerging slurs: tackling bias in BERT-based hate speech detection
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.1007/s43681-025-00701-z
dcterms.bibliographicCitation.journaltitle
AI and Ethics
dcterms.bibliographicCitation.number
4
dcterms.bibliographicCitation.pagestart
3953
dcterms.bibliographicCitation.pageend
3965
dcterms.bibliographicCitation.volume
5
dcterms.bibliographicCitation.url
https://doi.org/10.1007/s43681-025-00701-z
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik

refubium.funding
Springer Nature DEAL
refubium.note.author
Gefördert aus Open-Access-Mitteln der Freien Universität Berlin.
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2730-5961