Header

UZH-Logo

Maintenance Infos

From multilingual web-archives to parallel treebanks in five minutes


Killer, M; Sennrich, R; Volk, Martin (2011). From multilingual web-archives to parallel treebanks in five minutes. In: Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011, Hamburg, Germany, 28 September 2011 - 30 September 2011, 57-62.

Abstract

The Tree-to-Tree (t2t) Alignment Pipe is a collection of Python scripts, generating automatically aligned parallel treebanks from multilingual web resources or existing parallel corpora. The pipe contains wrappers for a number of freely available NLP software programs. Once these third party programs have been installed and the system and corpus specific details have been updated, the pipe is designed to generate a parallel treebank with a single program call from a unix command line. We discuss alignment quality on a fully automatically processed parallel corpus.

Abstract

The Tree-to-Tree (t2t) Alignment Pipe is a collection of Python scripts, generating automatically aligned parallel treebanks from multilingual web resources or existing parallel corpora. The pipe contains wrappers for a number of freely available NLP software programs. Once these third party programs have been installed and the system and corpus specific details have been updated, the pipe is designed to generate a parallel treebank with a single program call from a unix command line. We discuss alignment quality on a fully automatically processed parallel corpus.

Statistics

Downloads

120 downloads since deposited on 25 Oct 2011
11 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:30 September 2011
Deposited On:25 Oct 2011 11:40
Last Modified:15 Dec 2017 08:06
Publisher:Universität Hamburg
Series Name:Arbeiten zur Mehrsprachigkeit - Folge B
Number:96
ISSN:0176-599X
Official URL:http://www.corpora.uni-hamburg.de/gscl2011/en/?download=AZM96.pdf

Download

Download PDF  'From multilingual web-archives to parallel treebanks in five minutes'.
Preview
Content: Accepted Version
Language: English
Filetype: PDF
Size: 353kB