Header

UZH-Logo

Maintenance Infos

On Biasing Transformer Attention Towards Monotonicity


Rios, Annette; Amrhein, Chantal; Aepli, Noëmi; Sennrich, Rico (2021). On Biasing Transformer Attention Towards Monotonicity. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6 June 2021 - 11 June 2021. Association for Computational Linguistics, 4474-4488.

Abstract

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

Abstract

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

Statistics

Downloads

7 downloads since deposited on 25 May 2021
7 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Speech), original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:11 June 2021
Deposited On:25 May 2021 10:08
Last Modified:25 Oct 2021 16:23
Publisher:Association for Computational Linguistics
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://www.aclweb.org/anthology/2021.naacl-main.354
Project Information:
  • : FunderSNSF
  • : Grant IDPP00P1_176727
  • : Project TitleMulti-Task Learning with Multilingual Resources for Better Natural Language Understanding
  • : FunderSNSF
  • : Grant IDP0ZHP1_191934
  • : Project TitleSustainable Natural Language Processing for Low-Resource Language Variations

Download

Green Open Access

Download PDF  'On Biasing Transformer Attention Towards Monotonicity'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 649kB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)