Header

UZH-Logo

Maintenance Infos

On Romanization for Model Transfer Between Scripts in Neural Machine Translation


Amrhein, Chantal; Sennrich, Rico (2020). On Romanization for Model Transfer Between Scripts in Neural Machine Translation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, Online, 16 November 2020 - 20 November 2020, 2461-2469.

Abstract

Transfer learning is a popular strategy to improve the quality of low-resource machine translation. For an optimal transfer of the embedding layer, the child and parent model should share a substantial part of the vocabulary. This is not the case when transferring to languages with a different script. We explore the benefit of romanization in this scenario. Our results show that romanization entails information loss and is thus not always superior to simpler vocabulary transfer methods, but can improve the transfer between related languages with different scripts. We compare two romanization tools and find that they exhibit different degrees of information loss, which affects translation quality. Finally, we extend romanization to the target side, showing that this can be a successful strategy when coupled with a simple deromanization model.

Abstract

Transfer learning is a popular strategy to improve the quality of low-resource machine translation. For an optimal transfer of the embedding layer, the child and parent model should share a substantial part of the vocabulary. This is not the case when transferring to languages with a different script. We explore the benefit of romanization in this scenario. Our results show that romanization entails information loss and is thus not always superior to simpler vocabulary transfer methods, but can improve the transfer between related languages with different scripts. We compare two romanization tools and find that they exhibit different degrees of information loss, which affects translation quality. Finally, we extend romanization to the target side, showing that this can be a successful strategy when coupled with a simple deromanization model.

Statistics

Downloads

10 downloads since deposited on 10 Nov 2020
10 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:20 November 2020
Deposited On:10 Nov 2020 10:43
Last Modified:27 Nov 2020 07:34
Publisher:Association for Computational Linguistics
OA Status:Green
Official URL:https://www.aclweb.org/anthology/2020.findings-emnlp.223
Project Information:
  • : FunderSNSF
  • : Grant IDPP00P1_176727
  • : Project TitleMulti-Task Learning with Multilingual Resources for Better Natural Language Understanding

Download

Green Open Access

Download PDF  'On Romanization for Model Transfer Between Scripts in Neural Machine Translation'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 268kB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)