Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Using Multilingual Word Embeddings for Similarity-Based Word Alignments in a Zero-Shot Setting: Tested on the Case of German–Romansh

Dolev, Eyal Liron. Using Multilingual Word Embeddings for Similarity-Based Word Alignments in a Zero-Shot Setting: Tested on the Case of German–Romansh. 2022, University of Zurich, Faculty of Arts.

Abstract

Using multilingual word embeddings for computing word alignments has been shown to be competetive with statistical word alignment methods. However, the languages on which the experiments were made on were all “seen” languages, i.e., they were part of the training data for the embeddings. In this thesis I show that multilingual word embeddings taken from mBERT can be used for computing word alignments for the “unseen” language Romansh, aligned against German. The performance is on par with a baseline statistical model (fast_align). I also describe the creation of a gold standard for evaluating the quality of word alignments for German–Romansh, as well as the process of data collection for compiling a trilingual corpus containing press releases in German, Italian and Romansh, published by the Swiss Canton of Grisons. From this corpus, I extracted around 80,000 unique sentence pairs for each language combination.

Additional indexing

Item Type:Master's Thesis
Referees:Volk Martin
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Uncontrolled Keywords:NLP, Romansh, word alignment, corpus, multi-lingual corpus, Graubünden
Language:English
Date:15 August 2022
Deposited On:22 Jun 2023 08:23
Last Modified:22 Jun 2023 08:23
Number of Pages:99
OA Status:Green
Official URL:https://www.cl.uzh.ch/dam/jcr:5eda2d8a-fe61-4860-82c1-421e2e1d18c3/MA_thesis_Digital_Linguistics_Eyal_Dolev.pdf
Download PDF  'Using Multilingual Word Embeddings for Similarity-Based Word Alignments in a Zero-Shot Setting: Tested on the Case of German–Romansh'.
Preview
  • Content: Published Version
  • Language: English

Metadata Export

Statistics

Downloads

47 downloads since deposited on 22 Jun 2023
23 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications