Header

UZH-Logo

Maintenance Infos

SwissBERT: The Multilingual Language Model for Switzerland


Vamvas, Jannis; Graën, Johannes; Sennrich, Rico (2023). SwissBERT: The Multilingual Language Model for Switzerland. In: 8th Swiss Text Analytics Conference (SwissText), Neuchâtel, Switzerland, 12 June 2023 - 14 June 2023. Association for Computational Linguistics, 54-69.

Abstract

We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.

Abstract

We present SwissBERT, a masked language model created specifically for processing Switzerland-related text. SwissBERT is a pre-trained model that we adapted to news articles written in the national languages of Switzerland -- German, French, Italian, and Romansh. We evaluate SwissBERT on natural language understanding tasks related to Switzerland and find that it tends to outperform previous models on these tasks, especially when processing contemporary news and/or Romansh Grischun. Since SwissBERT uses language adapters, it may be extended to Swiss German dialects in future work. The model and our open-source code are publicly released at https://github.com/ZurichNLP/swissbert.

Statistics

Downloads

45 downloads since deposited on 27 Jun 2023
45 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
06 Faculty of Arts > Linguistic Research Infrastructure (LiRI)
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:14 June 2023
Deposited On:27 Jun 2023 15:38
Last Modified:07 Mar 2024 15:03
Publisher:Association for Computational Linguistics
Series Name:Proceedings of the Swiss Text Analytics Conference
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://aclanthology.org/2023.swisstext-1.6.pdf
Related URLs:https://arxiv.org/abs/2303.13310 (Author)
Project Information:
  • : FunderSNSF
  • : Grant ID176727
  • : Project TitleMulti-Task Learning with Multilingual Resources for Better Natural Language Understanding
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)