Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Crossing Sentence Boundaries in Machine Translation

Mascarell, Laura. Crossing Sentence Boundaries in Machine Translation. 2017, University of Zurich, Faculty of Arts.

Abstract

Machine Translation systems translate sentences in a document independently of the discourse and any context information that crosses sentence boundaries. Often, the context provided in a sentence is not enough to correctly disambiguate a word, and the systems make incorrect lexical choices that negatively impact on the quality of the translations.In this thesis, we attempt to integrate discourse knowledge into Machine Translation as a means to improve lexical choice in translation. Specifically, we develop discourse-aware methods for phrase-based Statistical Machine Translation systems, such as the sentence-level decoder Moses and the document-oriented decoder Docent. We also study the integration of discourse into Neural Machine Translation, whose high-quality translation output has recently attracted the attention of the Machine Translation community.To improve the lexical choice of Machine Translation systems, our methods mostly focus on consistent translation of nouns and exploiting lexical chains, which are chains of semantically-related words in a document. Translation consistency, which consists of identifying a correct translation of a word and apply it consistently across the document, has been addressed in the literature with mixed results. In our experiments, we apply consistency in the translation of nouns in particular cases, where a consistent translation is expected, such as references to compounds and pairs of repeated nouns. In other experiments, we benefit from the semantic context provided by lexical chains in the source document to also keep the semantic similarity between words in the translation.

Additional indexing

Item Type:Dissertation (monographical)
Referees:Volk Martin, Fishel Mark
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
UZH Dissertations
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2017
Deposited On:13 Feb 2019 16:12
Last Modified:25 Aug 2020 14:38
Number of Pages:130
OA Status:Green
Download PDF  'Crossing Sentence Boundaries in Machine Translation'.
Preview
  • Content: Published Version
  • Language: English

Metadata Export

Statistics

Downloads

122 downloads since deposited on 13 Feb 2019
19 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications