Youth language and emerging slurs: tackling bias in BERT-based hate speech detection

Fillies, Jan; Paschke, Adrian

doi:10.1007/s43681-025-00701-z

Youth language and emerging slurs: tackling bias in BERT-based hate speech detection

Haupttitel:

Youth language and emerging slurs: tackling bias in BERT-based hate speech detection

Autor*in:

Fillies, Jan; Paschke, Adrian

Erscheinungsjahr:

2025

Datum der Freigabe:

2025-08-14T09:22:55Z

Abstract:

With the increasing presence of adolescents and children online, it is crucial to evaluate algorithms designed to protect them from physical and mental harm. This study measures the bias introduced by emerging slurs found in youth language on existing BERT-based hate speech detection models. The research establishes a novel framework to identify language bias within trained networks, introducing a technique to detect emerging hate phrases and evaluate the unintended bias associated with them. As a result, three bias test sets are constructed: one for emerging hate speech terms, another for established hate terms, and one to test for overfitting. Based on these test sets, three scientific and one commercial hate speech detection models are assessed and compared. For comprehensive evaluation, the research introduces a novel Youth Language Bias Score. Finally, the study applies fine-tuning as a mitigation strategy for youth language bias, rigorously testing and evaluating the newly trained classifier. To summarize, the research introduces a novel framework for bias detection, highlights the influence of adolescent language on classifier performance in hate speech classification, and presents the first-ever hate speech classifier specifically trained for online youth language. This study focuses only on slurs in hateful speech, offering a foundational perspective for the field.

Identifier:

https://refubium.fu-berlin.de/handle/fub188/47159
http://dx.doi.org/10.17169/refubium-46877

Teil des Identifiers:

e-ISSN (online): 2730-5961

Sprache:

Englisch

Freie Schlagwörter:

Bias
Hate speech detection
NLP
Youth language

DDC-Klassifikation:

004 Datenverarbeitung; Informatik

Publikationstyp:

Wissenschaftlicher Artikel