Header

UZH-Logo

Maintenance Infos

Human-in-the-Loop Hate Speech Classification in a Multilingual Context


Kotarcic, Ana; Hangartner, Dominik; Gilardi, Fabrizio; Kurer, Selina; Donnay, Karsten (2023). Human-in-the-Loop Hate Speech Classification in a Multilingual Context. ArXiv.org 2212.02108, Cornell University.

Abstract

The shift of public debate to the digital sphere has been accompanied by a rise in online hate speech. While many promising approaches for hate speech classification have been pro- posed, studies often focus only on a single language, usually English, and do not address three key concerns: post-deployment perfor- mance, classifier maintenance and infrastruc- tural limitations. In this paper, we introduce a new human-in-the-loop BERT-based hate speech classification pipeline and trace its de- velopment from initial data collection and an- notation all the way to post-deployment. Our classifier, trained using data from our original corpus of over 422k examples, is specifically developed for the inherently multilingual set- ting of Switzerland and outperforms with its F1 score of 80.5 the currently best-performing BERT-based multilingual classifier by 5.8 F1 points in German and 3.6 F1 points in French. Our systematic evaluations over a 12-month period further highlight the vital importance of continuous, human-in-the-loop classifier main- tenance to ensure robust hate speech classifica- tion post-deployment.

Abstract

The shift of public debate to the digital sphere has been accompanied by a rise in online hate speech. While many promising approaches for hate speech classification have been pro- posed, studies often focus only on a single language, usually English, and do not address three key concerns: post-deployment perfor- mance, classifier maintenance and infrastruc- tural limitations. In this paper, we introduce a new human-in-the-loop BERT-based hate speech classification pipeline and trace its de- velopment from initial data collection and an- notation all the way to post-deployment. Our classifier, trained using data from our original corpus of over 422k examples, is specifically developed for the inherently multilingual set- ting of Switzerland and outperforms with its F1 score of 80.5 the currently best-performing BERT-based multilingual classifier by 5.8 F1 points in German and 3.6 F1 points in French. Our systematic evaluations over a 12-month period further highlight the vital importance of continuous, human-in-the-loop classifier main- tenance to ensure robust hate speech classifica- tion post-deployment.

Statistics

Citations

Dimensions.ai Metrics

1 citation in Scopus®

Altmetrics

Downloads

21 downloads since deposited on 24 Mar 2023
22 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Working Paper
Communities & Collections:06 Faculty of Arts > Institute of Political Science
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:320 Political science
Language:English
Date:1 January 2023
Deposited On:24 Mar 2023 15:27
Last Modified:29 Sep 2023 07:03
Series Name:ArXiv.org
ISSN:2331-8422
ISBN:978-1-959429-41-8
Additional Information:Findings of EMNLP 2022
OA Status:Green
Publisher DOI:https://doi.org/10.48550/arXiv.2212.02108
  • Content: Accepted Version
  • Language: English
  • Licence: Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)