Abstract
The shift of public debate to the digital sphere has been accompanied by a rise in online hate speech. While many promising approaches for hate speech classification have been pro- posed, studies often focus only on a single language, usually English, and do not address three key concerns: post-deployment perfor- mance, classifier maintenance and infrastruc- tural limitations. In this paper, we introduce a new human-in-the-loop BERT-based hate speech classification pipeline and trace its de- velopment from initial data collection and an- notation all the way to post-deployment. Our classifier, trained using data from our original corpus of over 422k examples, is specifically developed for the inherently multilingual set- ting of Switzerland and outperforms with its F1 score of 80.5 the currently best-performing BERT-based multilingual classifier by 5.8 F1 points in German and 3.6 F1 points in French. Our systematic evaluations over a 12-month period further highlight the vital importance of continuous, human-in-the-loop classifier main- tenance to ensure robust hate speech classifica- tion post-deployment.