Header

UZH-Logo

Maintenance Infos

Evaluating Discourse Phenomena in Neural Machine Translation


Bawden, Rachel; Sennrich, Rico; Birch, Alexandra; Haddow, Barry (2018). Evaluating Discourse Phenomena in Neural Machine Translation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)}, New Orleans, 1 June 2018 - 6 June 2018, 1304-1313.

Abstract

For machine translation to tackle discourse phenomena, models must have access to extra-sentential linguistic context. There has been recent interest in modelling context in neural machine translation (NMT), but models have been principally evaluated with standard automatic metrics, poorly adapted to evaluating discourse phenomena. In this article, we present hand-crafted, discourse test sets, designed to test the models{'} ability to exploit previous source and target sentences. We investigate the performance of recently proposed multi-encoder NMT models trained on subtitles for English to French. We also explore a novel way of exploiting context from the previous sentence. Despite gains using BLEU, multi-encoder models give limited improvement in the handling of discourse phenomena: 50{\%} accuracy on our coreference test set and 53.5{\%} for coherence/cohesion (compared to a non-contextual baseline of 50{\%}). A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance, and our novel strategy of multi-encoding and decoding of two sentences leads to the best performance (72.5{\%} for coreference and 57{\%} for coherence/cohesion), highlighting the importance of target-side context.

Abstract

For machine translation to tackle discourse phenomena, models must have access to extra-sentential linguistic context. There has been recent interest in modelling context in neural machine translation (NMT), but models have been principally evaluated with standard automatic metrics, poorly adapted to evaluating discourse phenomena. In this article, we present hand-crafted, discourse test sets, designed to test the models{'} ability to exploit previous source and target sentences. We investigate the performance of recently proposed multi-encoder NMT models trained on subtitles for English to French. We also explore a novel way of exploiting context from the previous sentence. Despite gains using BLEU, multi-encoder models give limited improvement in the handling of discourse phenomena: 50{\%} accuracy on our coreference test set and 53.5{\%} for coherence/cohesion (compared to a non-contextual baseline of 50{\%}). A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance, and our novel strategy of multi-encoding and decoding of two sentences leads to the best performance (72.5{\%} for coreference and 57{\%} for coherence/cohesion), highlighting the importance of target-side context.

Statistics

Downloads

25 downloads since deposited on 10 Apr 2019
25 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:6 June 2018
Deposited On:10 Apr 2019 12:50
Last Modified:25 Sep 2019 00:33
Publisher:Association for Computational Linguistics
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://aclweb.org/anthology/N18-1118
Project Information:
  • : FunderSNSF
  • : Grant ID105212_169888
  • : Project TitleRich Context in Neural Machine Translation

Download

Green Open Access

Download PDF  'Evaluating Discourse Phenomena in Neural Machine Translation'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 313kB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)