Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents

Grosjean, Juri Leander. Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents. 2024, University of Zurich, Faculty of Arts.

Abstract

Encoder models trained for the embedding of sentences or short documents have proven useful for tasks such as semantic search and topic modeling. In this paper, a version of the SwissBERT encoder model specifically fine-tuned for this purpose is presented. SwissBERT contains language adapters for the four national languages of Switzerland – German, French, Italian, and Romansh – and has been pre-trained on a large number of news articles in those languages. Using contrastive learn- ing based on a subset of the original training dataset, a fine-tuned version called SentenceSwissBERT was trained. Multilingual experiments on document retrieval, text classification, and topic modeling in a Switzerland-specific setting show that SentenceSwissBERT yields a better performance than the original model, as well as comparable baselines. The model is openly available for research use.

Additional indexing

Item Type:Master's Thesis
Referees:Schneider Gerold
Communities & Collections:06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:410 Linguistics
400 Language
Language:English
Date:1 June 2024
Deposited On:01 Nov 2024 09:05
Last Modified:10 Dec 2024 04:17
OA Status:Green
Download PDF  'Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents'.
Preview
  • Content: Published Version
  • Language: English

Metadata Export

Statistics

Downloads

28 downloads since deposited on 01 Nov 2024
30 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications