Header

UZH-Logo

Maintenance Infos

Normalising orthographic and dialectal variants for the automatic processing of Swiss German


Samardžić, Tanja; Scherrer, Yves; Glaser, Elvira (2015). Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In: Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznan, Poland, 27 November 2015 - 29 November 2015, 294-298.

Abstract

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication. Despite this fact, they lack tools and resources for natural language processing. The main reason for this is the fact that the dialects are mostly spoken and that written resources are small and highly inconsistent. This paper addresses the great variability in writing that poses a problem for automatic processing. We propose an automatic approach to normalising the variants to a single representation intended for processing tools’ internal use (not shown to human users). We manually create a sample of transcribed and normalised texts, which we use to train and test three methods based on machine translation: word-by-word mappings, character-based machine translation, and language modelling. We show that an optimal combination of the three approaches gives better results than any of them separately.

Abstract

Swiss dialects of German are, unlike most dialects of well standardised languages, widely used in everyday communication. Despite this fact, they lack tools and resources for natural language processing. The main reason for this is the fact that the dialects are mostly spoken and that written resources are small and highly inconsistent. This paper addresses the great variability in writing that poses a problem for automatic processing. We propose an automatic approach to normalising the variants to a single representation intended for processing tools’ internal use (not shown to human users). We manually create a sample of transcribed and normalised texts, which we use to train and test three methods based on machine translation: word-by-word mappings, character-based machine translation, and language modelling. We show that an optimal combination of the three approaches gives better results than any of them separately.

Statistics

Downloads

12 downloads since deposited on 07 Oct 2016
9 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of German Studies
08 University Research Priority Programs > Language and Space
Dewey Decimal Classification:430 German & related languages
Language:German
Event End Date:29 November 2015
Deposited On:07 Oct 2016 08:17
Last Modified:05 Sep 2017 02:22
Publisher:s.n.
Series Name:Proceedings of the 7th Language and Technology Conference
Related URLs:http://ltc.amu.edu.pl/ (Organisation)

Download

Download PDF  'Normalising orthographic and dialectal variants for the automatic processing of Swiss German'.
Preview
Content: Published Version
Filetype: PDF
Size: 70kB