Header

UZH-Logo

Maintenance Infos

Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures


Tang, Gongbo; Müller, Mathias; Rios, Annette; Sennrich, Rico (2018). Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, 2 November 2018 - 4 November 2018.

Abstract

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

Abstract

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

Statistics

Downloads

24 downloads since deposited on 02 Nov 2018
24 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Speech), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:4 November 2018
Deposited On:02 Nov 2018 14:13
Last Modified:20 Sep 2019 12:13
Publisher:ACL
OA Status:Green
Official URL:http://aclweb.org/anthology/D18-1458
Related URLs:https://arxiv.org/pdf/1808.08946.pdf
Project Information:
  • : FunderSNSF
  • : Grant ID105212_169888
  • : Project TitleRich Context in Neural Machine Translation
  • : FunderChinese Scholarship Council
  • : Grant ID201607110016
  • : Project Title

Download

Download PDF  'Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures'.
Preview
Content: Published Version
Filetype: PDF
Size: 586kB