Header

UZH-Logo

Maintenance Infos

Statistical machine translation of subtitles: From OpenSubtitles to TED


Müller, Mathias; Volk, Martin (2013). Statistical machine translation of subtitles: From OpenSubtitles to TED. In: Gurevych, Iryna; Biemann, Chris; Zesch, Torsten. Language Processing and Knowledge in the Web. Berlin Heidelberg: Springer, 132-138.

Abstract

In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.

Abstract

In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.

Statistics

Citations

2 citations in Web of Science®
1 citation in Scopus®
Google Scholar™

Altmetrics

Downloads

341 downloads since deposited on 25 Oct 2013
95 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2013
Deposited On:25 Oct 2013 06:53
Last Modified:05 Apr 2016 17:04
Publisher:Springer
Series Name:Lecture Notes in Computer Science
Number:8105
ISSN:0302-9743
ISBN:978-3-642-40722-2
Additional Information:25th International Conference, GSCL 2013, Darmstadt, Germany, September 25-27, 2013. Proceedings. The original publication is available at link.springer.com
Publisher DOI:https://doi.org/10.1007/978-3-642-40722-2_14
Related URLs:http://link.springer.com/book/10.1007/978-3-642-40722-2

Download

Preview Icon on Download
Preview
Content: Accepted Version
Filetype: PDF
Size: 150kB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations