Header

UZH-Logo

Maintenance Infos

Detecting Code-Switching in a Multilingual Alpine Heritage Corpus


Volk, Martin; Clematide, Simon (2014). Detecting Code-Switching in a Multilingual Alpine Heritage Corpus. In: Proceedings of the First Workshop on Computational Approaches to Code Switching, Doha, Qatar, 25 October 2014 - 25 October 2014, 24-33.

Abstract

This paper describes experiments in detecting and annotating code-switching in a large multilingual diachronic corpus of Swiss Alpine texts. The texts are in English, French, German, Italian, Romansh and Swiss German. Because of the multilingual authors (mountaineers, scientists) and the assumed multilingual readers, the texts contain numerous code-switching elements. When building and annotating the corpus, we faced issues of language identification on the sentence and sub-sentential level. We present our strategy for language identification and for the annotation of foreign language fragments within sentences. We report 78% precision on detecting a subset of code-switches with correct language labels and 92% unlabeled precision.

Abstract

This paper describes experiments in detecting and annotating code-switching in a large multilingual diachronic corpus of Swiss Alpine texts. The texts are in English, French, German, Italian, Romansh and Swiss German. Because of the multilingual authors (mountaineers, scientists) and the assumed multilingual readers, the texts contain numerous code-switching elements. When building and annotating the corpus, we faced issues of language identification on the sentence and sub-sentential level. We present our strategy for language identification and for the annotation of foreign language fragments within sentences. We report 78% precision on detecting a subset of code-switches with correct language labels and 92% unlabeled precision.

Statistics

Citations

Altmetrics

Downloads

134 downloads since deposited on 11 Nov 2014
65 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
08 University Research Priority Programs > Language and Space
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:25 October 2014
Deposited On:11 Nov 2014 15:11
Last Modified:14 Aug 2017 18:37
Publisher:Association for Computational Linguistics
ISBN:978-1-937284-96-1
Funders:Swiss National Science Foundation grant CRSII2_147653/1: MODERN: Modelling discourse entities and relations for coherent machine translation
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.aclweb.org/anthology/W14-39

Download

Preview Icon on Download
Preview
Content: Accepted Version
Language: English
Filetype: PDF
Size: 349kB