Header

UZH-Logo

Maintenance Infos

Crossing Sentence Boundaries in Machine Translation


Mascarell, Laura. Crossing Sentence Boundaries in Machine Translation. 2017, University of Zurich, Faculty of Arts.

Abstract

Machine Translation systems translate sentences in a document independently of the discourse and any context information that crosses sentence boundaries. Often, the context provided in a sentence is not enough to correctly disambiguate a word, and the systems make incorrect lexical choices that negatively impact on the quality of the translations.In this thesis, we attempt to integrate discourse knowledge into Machine Translation as a means to improve lexical choice in translation. Specifically, we develop discourse-aware methods for phrase-based Statistical Machine Translation systems, such as the sentence-level decoder Moses and the document-oriented decoder Docent. We also study the integration of discourse into Neural Machine Translation, whose high-quality translation output has recently attracted the attention of the Machine Translation community.To improve the lexical choice of Machine Translation systems, our methods mostly focus on consistent translation of nouns and exploiting lexical chains, which are chains of semantically-related words in a document. Translation consistency, which consists of identifying a correct translation of a word and apply it consistently across the document, has been addressed in the literature with mixed results. In our experiments, we apply consistency in the translation of nouns in particular cases, where a consistent translation is expected, such as references to compounds and pairs of repeated nouns. In other experiments, we benefit from the semantic context provided by lexical chains in the source document to also keep the semantic similarity between words in the translation.

Abstract

Machine Translation systems translate sentences in a document independently of the discourse and any context information that crosses sentence boundaries. Often, the context provided in a sentence is not enough to correctly disambiguate a word, and the systems make incorrect lexical choices that negatively impact on the quality of the translations.In this thesis, we attempt to integrate discourse knowledge into Machine Translation as a means to improve lexical choice in translation. Specifically, we develop discourse-aware methods for phrase-based Statistical Machine Translation systems, such as the sentence-level decoder Moses and the document-oriented decoder Docent. We also study the integration of discourse into Neural Machine Translation, whose high-quality translation output has recently attracted the attention of the Machine Translation community.To improve the lexical choice of Machine Translation systems, our methods mostly focus on consistent translation of nouns and exploiting lexical chains, which are chains of semantically-related words in a document. Translation consistency, which consists of identifying a correct translation of a word and apply it consistently across the document, has been addressed in the literature with mixed results. In our experiments, we apply consistency in the translation of nouns in particular cases, where a consistent translation is expected, such as references to compounds and pairs of repeated nouns. In other experiments, we benefit from the semantic context provided by lexical chains in the source document to also keep the semantic similarity between words in the translation.

Statistics

Downloads

17 downloads since deposited on 13 Feb 2019
17 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Volk Martin, Fishel Mark
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2017
Deposited On:13 Feb 2019 16:12
Last Modified:17 Sep 2019 19:19
Number of Pages:130
OA Status:Green

Download

Download PDF  'Crossing Sentence Boundaries in Machine Translation'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 1MB