Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Parallel subtitle corpora and their applications in machine translation and translatology

Bywood, Lindsay; Volk, Martin; Fishel, Mark; Georgakopoulou, Panayota (2013). Parallel subtitle corpora and their applications in machine translation and translatology. Perspectives: Studies in Translatology, 21(4):595-610.

Abstract

SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four subtitling companies (InVision, DDS, Titelbild, VSI) and five technical partners (ALS, ATC, TextShuttle, University of Maribor, Vicomtech).For the SUMAT project, translated subtitles for seven language pairs have been collected. Four subtitling companies have contributed to this effort, which has so far resulted in collections numbering between 200,000 and 2 million subtitles per language pair. This paper describes the process of converting, classifying and aligning the subtitles. Conversion to a common text format and cross-language alignment were automatically done, using specially built converters, whilst classifying the subtitles according to text genre was a manual process, performed by the teams harvesting the subtitles.The resulting subtitle corpora are perfectly suited for various applications. The focus of the SUMAT project is to use them as training material for statistical machine translation systems, and this paper will report on the initial experiences with some of the language pairs. In addition, the parallel corpora may serve as input data for parallel concordancing systems. As part of the project, a small prototype has been built which shows how word-aligned parallel subtitles offer new insights for translation science.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Social Sciences & Humanities > Cultural Studies
Social Sciences & Humanities > Language and Linguistics
Social Sciences & Humanities > Linguistics and Language
Social Sciences & Humanities > Literature and Literary Theory
Language:English
Date:2013
Deposited On:23 Dec 2013 08:28
Last Modified:10 Jan 2025 02:41
Publisher:Taylor & Francis
ISSN:0907-676X
OA Status:Closed
Publisher DOI:https://doi.org/10.1080/0907676X.2013.831920

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
14 citations in Web of Science®
21 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

5 downloads since deposited on 23 Dec 2013
0 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications