Header

UZH-Logo

Maintenance Infos

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English


Tang, Gongbo; Sennrich, Rico; Nivre, Joakim (2020). Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8 December 2020 - 13 December 2020, 4251-4262.

Abstract

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

Abstract

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

Statistics

Downloads

6 downloads since deposited on 07 Dec 2020
6 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:13 December 2020
Deposited On:07 Dec 2020 15:56
Last Modified:14 Dec 2020 10:31
Publisher:International Committee on Computational Linguistics
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://www.aclweb.org/anthology/2020.coling-main.375

Download

Green Open Access

Download PDF  'Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 349kB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)