Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Sparse Attention with Linear Units

Zhang, Biao; Titov, Ivan; Sennrich, Rico (2021). Sparse Attention with Linear Units. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7 November 2021 - 11 November 2021. ACL Anthology, 6507-6520.

Abstract

Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing the softmax function in the attention with its sparse variants. In this work, we introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a , and show that sparsity naturally emerges from such a formulation. Training stability is achieved with layer normalization with either a specialized initialization or an additional gating function. Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms. We apply ReLA to the Transformer and conduct experiments on five machine translation tasks. ReLA achieves translation performance comparable to several strong baselines, with training and decoding speed similar to that of the vanilla attention. Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment than recent sparsified softmax-based models. Intriguingly, ReLA heads also learn to attend to nothing (i.e. ‘switch off’) for some queries, which is not possible with sparsified softmax alternatives.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:11 November 2021
Deposited On:08 Nov 2021 15:53
Last Modified:28 Apr 2022 07:15
Publisher:ACL Anthology
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://aclanthology.org/2021.emnlp-main.523
Project Information:
  • Funder: SNSF
  • Grant ID: PP00P1_176727
  • Project Title: Multi-Task Learning with Multilingual Resources for Better Natural Language Understanding
Download PDF  'Sparse Attention with Linear Units'.
Preview
  • Content: Published Version
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

7 citations in Web of Science®
11 citations in Scopus®
Google Scholar™

Downloads

23 downloads since deposited on 08 Nov 2021
6 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications