Header

UZH-Logo

Maintenance Infos

Parallel treebanking Spanish-Quechua: How and how well do they align?


Rios, A; Göhring, A; Volk, M (2012). Parallel treebanking Spanish-Quechua: How and how well do they align? In: The 10th International Workshop on Treebanks and Linguistic Theories (TLT10), Heidelberg, Germany, 6 January 2012 - 7 January 2012, online.

Abstract

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.

Abstract

Parallel treebanking is greatly facilitated by automatic word alignment. We work on building a trilingual treebank for German, Spanish and Quechua. We ran different alignment experiments on parallel Spanish-Quechua texts, measured the alignment quality, and compared these results to the figures we obtained aligning a comparable corpus of Spanish-German texts. This preliminary work has shown us the best word segmentation to use for the agglutinative language Quechua with respect to alignment. We also acquired a first impression about how well Quechua can be aligned to Spanish, an important prerequisite for bilingual lexicon extraction, parallel treebanking or statistical machine translation.

Statistics

Downloads

148 downloads since deposited on 29 Mar 2012
50 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:7 January 2012
Deposited On:29 Mar 2012 06:48
Last Modified:13 Aug 2017 05:52
Publisher:CSLI Publications
Series Name:Linguistic Issues in Language Technology
Number:7/13
ISSN:1945-3590
Official URL:http://elanguage.net/journals/lilt/article/view/2695
Related URLs:http://tlt10.cl.uni-heidelberg.de/ (Publisher)

Download

Preview Icon on Download
Preview
Content: Published Version
Filetype: PDF
Size: 379kB