Header

UZH-Logo

Maintenance Infos

Innovations in parallel corpus search tools


Volk, Martin; Graën, Johannes; Callegaro, Elena (2014). Innovations in parallel corpus search tools. In: Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, 26 May 2014 - 31 May 2014.

Abstract

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.

Abstract

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.

Statistics

Citations

1 citation in Web of Science®
1 citation in Scopus®
3 citations in Microsoft Academic
Google Scholar™

Altmetrics

Downloads

175 downloads since deposited on 14 Jul 2014
57 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
08 University Research Priority Programs > Language and Space
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:31 May 2014
Deposited On:14 Jul 2014 07:01
Last Modified:14 Feb 2018 21:22
Publisher:European Language Resources Association (ELRA)
ISBN:978-2-9517408-8-4
OA Status:Green
Official URL:http://www.lrec-conf.org/proceedings/lrec2014/pdf/504_Paper.pdf
Related URLs:http://www.lrec-conf.org/proceedings/lrec2014/summaries/504.html

Download

Download PDF  'Innovations in parallel corpus search tools'.
Preview
Content: Published Version
Filetype: PDF
Size: 306kB