UZH-Logo

Maintenance Infos

Iterative, MT-based sentence alignment of parallel texts


Sennrich, R; Volk, M (2011). Iterative, MT-based sentence alignment of parallel texts. In: NODALIDA 2011, Nordic Conference of Computational Linguistics, Riga, 11 May 2011 - 13 May 2011.

Abstract

Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts.
However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system.
We describe a bootstrapping approach to sentence alignment that resolves this circular dependency by computing an initial alignment with length-based methods.
Our evaluation shows that iterative MT-based sentence alignment significantly outperforms widespread alignment approaches on our evaluation set, without requiring any linguistic resources other than the to-be-aligned bitext.

Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts.
However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system.
We describe a bootstrapping approach to sentence alignment that resolves this circular dependency by computing an initial alignment with length-based methods.
Our evaluation shows that iterative MT-based sentence alignment significantly outperforms widespread alignment approaches on our evaluation set, without requiring any linguistic resources other than the to-be-aligned bitext.

Downloads

249 downloads since deposited on 10 May 2011
95 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:13 May 2011
Deposited On:10 May 2011 09:07
Last Modified:05 Apr 2016 14:54
Funders:Swiss National Science Foundation
Related URLs:http://www.lumii.lv/nodalida2011/home.html
Permanent URL: https://doi.org/10.5167/uzh-48036

Download

[img]
Preview
Content: Accepted Version
Language: English
Filetype: PDF
Size: 1MB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations