Header

UZH-Logo

Maintenance Infos

Statistical machine translation of subtitles: From OpenSubtitles to TED


Müller, Mathias; Volk, Martin (2013). Statistical machine translation of subtitles: From OpenSubtitles to TED. In: Gurevych, Iryna; Biemann, Chris; Zesch, Torsten. Language Processing and Knowledge in the Web. Berlin Heidelberg: Springer, 132-138.

Abstract

In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.

Abstract

In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.

Statistics

Citations

Dimensions.ai Metrics
2 citations in Web of Science®
3 citations in Scopus®
2 citations in Microsoft Academic
Google Scholar™

Altmetrics

Downloads

404 downloads since deposited on 25 Oct 2013
48 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2013
Deposited On:25 Oct 2013 06:53
Last Modified:16 Feb 2018 18:13
Publisher:Springer
Series Name:Lecture Notes in Computer Science
Number:8105
ISSN:0302-9743
ISBN:978-3-642-40722-2
Additional Information:25th International Conference, GSCL 2013, Darmstadt, Germany, September 25-27, 2013. Proceedings. The original publication is available at link.springer.com
OA Status:Green
Publisher DOI:https://doi.org/10.1007/978-3-642-40722-2_14
Related URLs:http://link.springer.com/book/10.1007/978-3-642-40722-2

Download

Download PDF  'Statistical machine translation of subtitles: From OpenSubtitles to TED'.
Preview
Content: Accepted Version
Filetype: PDF
Size: 150kB
View at publisher
Get full-text in a library