Header

UZH-Logo

Maintenance Infos

Statistical machine translation of subtitles: From OpenSubtitles to TED


Müller, Mathias; Volk, Martin (2013). Statistical machine translation of subtitles: From OpenSubtitles to TED. In: Gurevych, Iryna; Biemann, Chris; Zesch, Torsten. Language Processing and Knowledge in the Web. Berlin Heidelberg: Springer, 132-138.

Abstract

In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.

Abstract

In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.

Statistics

Citations

Dimensions.ai Metrics
6 citations in Web of Science®
11 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

1457 downloads since deposited on 25 Oct 2013
246 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > General Computer Science
Language:English
Date:2013
Deposited On:25 Oct 2013 06:53
Last Modified:24 Jan 2022 01:49
Publisher:Springer
Series Name:Lecture Notes in Computer Science
Number:8105
ISSN:0302-9743
ISBN:978-3-642-40722-2
Additional Information:25th International Conference, GSCL 2013, Darmstadt, Germany, September 25-27, 2013. Proceedings. The original publication is available at link.springer.com
OA Status:Green
Publisher DOI:https://doi.org/10.1007/978-3-642-40722-2_14
Related URLs:http://link.springer.com/book/10.1007/978-3-642-40722-2