Abstract
Encoder models trained for the embedding of sentences or short documents have proven useful for tasks such as semantic search and topic modeling. In this paper, a version of the SwissBERT encoder model specifically fine-tuned for this purpose is presented. SwissBERT contains language adapters for the four national languages of Switzerland – German, French, Italian, and Romansh – and has been pre-trained on a large number of news articles in those languages. Using contrastive learn- ing based on a subset of the original training dataset, a fine-tuned version called SentenceSwissBERT was trained. Multilingual experiments on document retrieval, text classification, and topic modeling in a Switzerland-specific setting show that SentenceSwissBERT yields a better performance than the original model, as well as comparable baselines. The model is openly available for research use.