Permanent URL to this publication: http://dx.doi.org/10.5167/uzh-62565
Abdul-Rauf, Sadaf; Fishel, Mark; Lambert, Patrik; Noubours, Sandra; Sennrich, Rico (2012). Extrinsic evaluation of sentence alignment systems. In: Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, Istanbul, 27 May 2012 - 27 May 2012, 6-10.
| Accepted Version 112Kb |
Abstract
Parallel corpora are usually a collection of documents which are translations of each other. To be useful in NLP applications such as word alignment or machine translation, they first have to be aligned at the sentence level. This paper is a user study briefly reviewing several sentence aligners and evaluating them based on the performance achieved by the SMT systems trained on their output. We conducted experiments on two language pairs and showed that using a more advanced sentence alignment algorithm may yield gains of 0.5 to 1 BLEU points.
| Item Type: | Conference or Workshop Item (Paper), refereed, original work |
|---|---|
| Communities & Collections: | 06 Faculty of Arts > Institute of Computational Linguistics |
| DDC: | 000 Computer science, knowledge & systems 410 Linguistics |
| Language: | English |
| Event End Date: | 27 May 2012 |
| Deposited On: | 04 Jun 2012 11:20 |
| Last Modified: | 18 Oct 2012 13:27 |
| Free access at: | Official URL. An embargo period may apply. |
| Official URL: | http://www.lrec-conf.org/proceedings/lrec2012/workshops/26.Credislas-Proceedings.pdf |
| Related URLs: | http://www-lium.univ-lemans.fr/credislas2012/index.php |
Users (please log in): suggest update or correction for this item
Repository Staff Only: item control page