Header

UZH-Logo

Maintenance Infos

A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish


Clematide, Simon; Lehner, Stéphanie; Graën, Johannes; Volk, Martin (2018). A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish. In: Mitkov, Ruslan; Monti, Johanna; Corpas Pastor, Gloria; Seretan, Violeta. Multiword Units in Machine Translation and Translation Technology. Amsterdam: John Benjamins, 125-145.

Abstract

This article describes a new word alignment gold standard for German nominal compounds and their multiword translation equivalents in English, French, Italian, and Spanish. The gold standard contains alignments for each of the ten language pairs, resulting in a total of 8,229 bidirectional alignments. It covers 362 occurrences of 137 different German compounds randomly selected from the corpus of European Parliament plenary sessions, sampled according to the criteria of frequency and morphological complexity. The standard serves for the evaluation and optimisation of automatic word alignments in the context of spotting translations of German compounds. The study also shows that in this text genre, around 80% of German noun types are morphological compounds indicating potential multiword units in their parallel equivalents.

Abstract

This article describes a new word alignment gold standard for German nominal compounds and their multiword translation equivalents in English, French, Italian, and Spanish. The gold standard contains alignments for each of the ten language pairs, resulting in a total of 8,229 bidirectional alignments. It covers 362 occurrences of 137 different German compounds randomly selected from the corpus of European Parliament plenary sessions, sampled according to the criteria of frequency and morphological complexity. The standard serves for the evaluation and optimisation of automatic word alignments in the context of spotting translations of German compounds. The study also shows that in this text genre, around 80% of German noun types are morphological compounds indicating potential multiword units in their parallel equivalents.

Statistics

Citations

Altmetrics

Downloads

3 downloads since deposited on 14 Feb 2019
2 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2018
Deposited On:14 Feb 2019 15:33
Last Modified:26 Jan 2022 20:09
Publisher:John Benjamins
Number:341
ISBN:978 90 272 0060 0
OA Status:Closed
Publisher DOI:https://doi.org/10.1075/cilt.341
Related URLs:http://pub.cl.uzh.ch/purl/compal_gs (Research Data)
Other Identification Number:e-Book ISBN 978 90 272 6420 6
Project Information:
  • : FunderSNSF
  • : Grant ID105215_146781
  • : Project TitleLarge-scale Annotation and Alignment of Parallel Corpora for the Investigation of Linguistic Variation