Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Does mBERT Understand Romansh? Evaluating Word Embeddings Using Word Alignment

Dolev, Eyal Liron (2023). Does mBERT Understand Romansh? Evaluating Word Embeddings Using Word Alignment. In: SwissText 2023, Neuchâtel, 12 June 2023 - 14 June 2023. Association for Computational Linguistics, 41-53.

Abstract

We test similarity-based word alignment models (SimAlign and awesome-align) in combination with word embeddings from mBERT and XLM-R on parallel sentences in German and Romansh. Since Romansh is an unseen language, we are dealing with a zero-shot setting. Using embeddings from mBERT, both models reach an alignment error rate of 0.22, which outperforms fast_align, a statistical model, and is on par with similarity-based word alignment for seen languages. We interpret these results as evidence that mBERT contains information that can be meaningful and applicable to Romansh.
To evaluate performance, we also present a new trilingual corpus, which we call the DERMIT (DE-RM-IT) corpus, containing press releases made by the Canton of Grisons in German, Romansh and Italian in the past 25 years. The corpus contains 4 547 parallel documents and approximately 100 000 sentence pairs in each language combination. We additionally present a gold standard for German-Romansh word alignment. The data is available at https://github.com/eyldlv/DERMIT-Corpus.

Additional indexing

Item Type:Conference or Workshop Item (Speech), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:410 Linguistics
Language:English
Event End Date:14 June 2023
Deposited On:04 Jan 2024 13:02
Last Modified:20 Mar 2024 13:08
Publisher:Association for Computational Linguistics
Series Name:Proceedings of the Swiss Text Analytics Conference
Additional Information:8th edition
OA Status:Green
Official URL:https://aclanthology.org/2023.swisstext-1.5
Download PDF  'Does mBERT Understand Romansh? Evaluating Word Embeddings Using Word Alignment'.
Preview
  • Content: Accepted Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Downloads

19 downloads since deposited on 04 Jan 2024
15 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications