Header

UZH-Logo

Maintenance Infos

A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation


Müller, Mathias; Rios, Annette; Voita, Elena; Sennrich, Rico (2018). A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation. In: Third Conference on Machine Translation (WMT 18), Brussels, Belgium, 31 October 2018 - 1 November 2018. ACL, 61-72.

Abstract

The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU.
However, metrics that quantify the overall translation quality are ill-equipped to measure gains from additional context. We argue that a different kind of evaluation is needed to assess how well models translate inter-sentential phenomena such as pronouns. This paper therefore presents a test suite of contrastive translations focused specifically on the translation of pronouns.
Furthermore, we perform experiments with several context-aware models. We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set. Our experiments also show the effectiveness of parameter tying for multi-encoder architectures.

Abstract

The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU.
However, metrics that quantify the overall translation quality are ill-equipped to measure gains from additional context. We argue that a different kind of evaluation is needed to assess how well models translate inter-sentential phenomena such as pronouns. This paper therefore presents a test suite of contrastive translations focused specifically on the translation of pronouns.
Furthermore, we perform experiments with several context-aware models. We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set. Our experiments also show the effectiveness of parameter tying for multi-encoder architectures.

Statistics

Citations

Downloads

51 downloads since deposited on 02 Oct 2018
7 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Other), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:1 November 2018
Deposited On:02 Oct 2018 13:42
Last Modified:13 Oct 2023 13:33
Publisher:ACL
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.statmt.org/wmt18/pdf/WMT007.pdf
Project Information:
  • : FunderSNSF
  • : Grant ID105212_169888
  • : Project TitleRich Context in Neural Machine Translation
  • Content: Published Version