Header

UZH-Logo

Maintenance Infos

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives


Voita, Elena; Sennrich, Rico; Titov, Ivan (2019). The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3 November 2019 - 7 November 2019, 4387-4397.

Abstract

We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We chose the Transformers for our analysis as they have been shown effective with various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language modeling (MLM). Previous work used black-box probing tasks to show that the representations learned by the Transformer differ significantly depending on the objective. In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and observe that the choice of the objective determines this process. For example, as you go from bottom to top layers, information about the past in left-to-right language models gets vanished and predictions about the future get formed. In contrast, for MLM, representations initially acquire information about the context around the token, partially forgetting the token identity and producing a more generalized token representation. The token identity then gets recreated at the top MLM layers.

Abstract

We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We chose the Transformers for our analysis as they have been shown effective with various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language modeling (MLM). Previous work used black-box probing tasks to show that the representations learned by the Transformer differ significantly depending on the objective. In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and observe that the choice of the objective determines this process. For example, as you go from bottom to top layers, information about the past in left-to-right language models gets vanished and predictions about the future get formed. In contrast, for MLM, representations initially acquire information about the context around the token, partially forgetting the token identity and producing a more generalized token representation. The token identity then gets recreated at the top MLM layers.

Statistics

Downloads

8 downloads since deposited on 05 Nov 2019
8 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:7 November 2019
Deposited On:05 Nov 2019 14:42
Last Modified:05 Nov 2019 14:42
Publisher:Association for Computational Linguistics
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://www.aclweb.org/anthology/D19-1448.pdf
Related URLs:https://www.aclweb.org/anthology/D19-1448
Project Information:
  • : FunderSNSF
  • : Grant IDPP00P1_176727
  • : Project TitleMulti-Task Learning with Multilingual Resources for Better Natural Language Understanding

Download

Green Open Access

Download PDF  'The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 3MB
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)