Header

UZH-Logo

Maintenance Infos

Morphological analysis and lemmatization for Swiss German using weighted transducers


Baumgartner, Reto (2016). Morphological analysis and lemmatization for Swiss German using weighted transducers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS) Bochum, Germany September 19–21, 2016, Bochum, Germany, 19 September 2016 - 21 September 2016, 44-49.

Abstract

With written Swiss German becoming more popular in everyday use, it has become a target for text processing. The absence of a standard orthography and the variety of dialects, however, lead to a vast variation in different spellings which makes this task difficult. We built a system based on weighted transducers that recognizes over 90% of the tokens in certain texts. Weights ensure preferring the best analysis for most words while at the same time allowing for very broad range of spelling variations. Our morphological tagset that we defined for this purpose and lemmas in Standard German open the possibility for further processing. Besides our morphological analyzer and lemmatizer, a morphologically annotated corpus offers new resources for Swiss German and helps spreading our tagset.

Abstract

With written Swiss German becoming more popular in everyday use, it has become a target for text processing. The absence of a standard orthography and the variety of dialects, however, lead to a vast variation in different spellings which makes this task difficult. We built a system based on weighted transducers that recognizes over 90% of the tokens in certain texts. Weights ensure preferring the best analysis for most words while at the same time allowing for very broad range of spelling variations. Our morphological tagset that we defined for this purpose and lemmas in Standard German open the possibility for further processing. Besides our morphological analyzer and lemmatizer, a morphologically annotated corpus offers new resources for Swiss German and helps spreading our tagset.

Statistics

Downloads

2 downloads since deposited on 29 Oct 2020
2 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:21 September 2016
Deposited On:29 Oct 2020 15:16
Last Modified:29 Oct 2020 20:30
Publisher:Ruhr-Universität Bochum
Series Name:Bochumer Linguistische Arbeitsberichte
ISSN:2190-0949
OA Status:Green

Download

Green Open Access

Download PDF  'Morphological analysis and lemmatization for Swiss German using weighted transducers'.
Preview
Content: Published Version
Filetype: PDF
Size: 246kB
Publisher License