Header

UZH-Logo

Maintenance Infos

Treatment of Markup in Statistical Machine Translation


Müller, Mathias (2017). Treatment of Markup in Statistical Machine Translation. In: Third Workshop on Discourse in Machine Translation, Copenhagen, Denmark, 8 September 2017 - 8 September 2017, 36-46.

Abstract

We present work on handling XML markup in Statistical Machine Translation (SMT). The methods we propose can be used to effectively preserve markup (for instance inline formatting or structure) and to place markup correctly in a machine-translated segment. We evaluate our approaches with parallel data that naturally contains markup or where markup was inserted to create synthetic examples. In our experiments, hybrid reinsertion has proven the most accurate method to handle markup, while alignment masking and alignment reinsertion should be regarded as viable alternatives. We provide implementations of all the methods described and they are freely available as an open-source framework.

Abstract

We present work on handling XML markup in Statistical Machine Translation (SMT). The methods we propose can be used to effectively preserve markup (for instance inline formatting or structure) and to place markup correctly in a machine-translated segment. We evaluate our approaches with parallel data that naturally contains markup or where markup was inserted to create synthetic examples. In our experiments, hybrid reinsertion has proven the most accurate method to handle markup, while alignment masking and alignment reinsertion should be regarded as viable alternatives. We provide implementations of all the methods described and they are freely available as an open-source framework.

Statistics

Downloads

1 download since deposited on 03 Oct 2017
1 download since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Other), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:8 September 2017
Deposited On:03 Oct 2017 13:44
Last Modified:03 Oct 2017 13:44
Publisher:Association of Computational Linguistics
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.aclweb.org/anthology/W/W17/W17-4804.pdf
Related URLs:https://gitlab.cl.uzh.ch/mt/mtrain

Download

Download PDF  'Treatment of Markup in Statistical Machine Translation'.
Preview
Filetype: PDF
Size: 167kB