Publication: Sparse Attention with Linear Units
Sparse Attention with Linear Units
Date
Date
Date
Citations
Zhang, B., Titov, I., & Sennrich, R. (2021). Sparse Attention with Linear Units. 6507–6520. https://aclanthology.org/2021.emnlp-main.523
Abstract
Abstract
Abstract
Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing the softmax function in the attention with its sparse variants. In this work, we introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a , and show that sparsity naturally emerges from such a formulation. Training stability is achieved with layer normalization with either a specialized initialization or an additional gating function. Our model, which we call Rectified Linear Attent
Metrics
Downloads
Views
Additional indexing
Creators (Authors)
Event Title
Event Title
Event Title
Event Location
Event Location
Event Location
Event Country
Event Country
Event Country
Event Start Date
Event Start Date
Event Start Date
Event End Date
Event End Date
Event End Date
Publisher
Publisher
Publisher
Page range/Item number
Page range/Item number
Page range/Item number
Page end
Page end
Page end
Item Type
Item Type
Item Type
In collections
Dewey Decimal Classifikation
Dewey Decimal Classifikation
Dewey Decimal Classifikation
Language
Language
Language
Date available
Date available
Date available
OA Status
OA Status
OA Status
Free Access at
Free Access at
Free Access at
Metrics
Downloads
Views
Citations
Zhang, B., Titov, I., & Sennrich, R. (2021). Sparse Attention with Linear Units. 6507–6520. https://aclanthology.org/2021.emnlp-main.523