Header

UZH-Logo

Maintenance Infos

Context-Aware Monolingual Repair for Neural Machine Translation


Voita, Elena; Sennrich, Rico; Titov, Ivan (2019). Context-Aware Monolingual Repair for Neural Machine Translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3 November 2019 - 7 November 2019. Association for Computational Linguistics, 876-885.

Abstract

Modern sentence-level NMT systems often produce plausible translations of isolated sentences. However, when put in context, these translations may end up being inconsistent with each other. We propose a monolingual DocRepair model to correct inconsistencies between sentence-level translations. DocRepair performs automatic post-editing on a sequence of sentence-level translations, refining translations of sentences in context of each other. For training, the DocRepair model requires only monolingual document-level data in the target language. It is trained as a monolingual sequence-to-sequence model that maps inconsistent groups of sentences into consistent ones. The consistent groups come from the original training data; the inconsistent groups are obtained by sampling round-trip translations for each isolated sentence. We show that this approach successfully imitates inconsistencies we aim to fix: using contrastive evaluation, we show large improvements in the translation of several contextual phenomena in an English-Russian translation task, as well as improvements in the BLEU score. We also conduct a human evaluation and show a strong preference of the annotators to corrected translations over the baseline ones. Moreover, we analyze which discourse phenomena are hard to capture using monolingual data only.

Abstract

Modern sentence-level NMT systems often produce plausible translations of isolated sentences. However, when put in context, these translations may end up being inconsistent with each other. We propose a monolingual DocRepair model to correct inconsistencies between sentence-level translations. DocRepair performs automatic post-editing on a sequence of sentence-level translations, refining translations of sentences in context of each other. For training, the DocRepair model requires only monolingual document-level data in the target language. It is trained as a monolingual sequence-to-sequence model that maps inconsistent groups of sentences into consistent ones. The consistent groups come from the original training data; the inconsistent groups are obtained by sampling round-trip translations for each isolated sentence. We show that this approach successfully imitates inconsistencies we aim to fix: using contrastive evaluation, we show large improvements in the translation of several contextual phenomena in an English-Russian translation task, as well as improvements in the BLEU score. We also conduct a human evaluation and show a strong preference of the annotators to corrected translations over the baseline ones. Moreover, we analyze which discourse phenomena are hard to capture using monolingual data only.

Statistics

Downloads

30 downloads since deposited on 05 Nov 2019
9 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:7 November 2019
Deposited On:05 Nov 2019 14:41
Last Modified:27 Nov 2020 07:32
Publisher:Association for Computational Linguistics
OA Status:Green
Official URL:https://www.aclweb.org/anthology/D19-1081.pdf
Related URLs:https://www.aclweb.org/anthology/D19-1081
Project Information:
  • : FunderH2020
  • : Grant ID825460
  • : Project TitleEuropean Live Translator
  • : FunderRoyal Society
  • : Grant IDNAF\R1\180122
  • : Project TitleTowards Discourse-level Neural Machine Translation

Download

Green Open Access

Download PDF  'Context-Aware Monolingual Repair for Neural Machine Translation'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 638kB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)