Header

UZH-Logo

Maintenance Infos

Digitising Swiss German: how to process and study a polycentric spoken language


Scherrer, Yves; Samardžić, Tanja; Glaser, Elvira (2019). Digitising Swiss German: how to process and study a polycentric spoken language. Language Resources and Evaluation, 53(4):735-769.

Abstract

Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety and that it is subject to considerable regional variation. This paper presents the ArchiMob corpus, a freely available general-purpose corpus of spoken Swiss German based on oral history interviews. The corpus is a result of a long design process, intensive manual work and specially adapted computational processing. We first present the modalities of access of the corpus for linguistic, historic and computational research. We then describe how the documents were transcribed, segmented and aligned with the sound source. This work involved a series of experiments that have led to automatically annotated normalisation and part-of-speech tagging layers. Finally, we present several case studies to motivate the use of the corpus for digital humanities in general and for dialectology in particular.

Abstract

Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety and that it is subject to considerable regional variation. This paper presents the ArchiMob corpus, a freely available general-purpose corpus of spoken Swiss German based on oral history interviews. The corpus is a result of a long design process, intensive manual work and specially adapted computational processing. We first present the modalities of access of the corpus for linguistic, historic and computational research. We then describe how the documents were transcribed, segmented and aligned with the sound source. This work involved a series of experiments that have led to automatically annotated normalisation and part-of-speech tagging layers. Finally, we present several case studies to motivate the use of the corpus for digital humanities in general and for dialectology in particular.

Statistics

Citations

Dimensions.ai Metrics
1 citation in Web of Science®
2 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

1 download since deposited on 16 Dec 2019
1 download since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of German Studies
Dewey Decimal Classification:430 German & related languages
Scopus Subject Areas:Social Sciences & Humanities > Language and Linguistics
Social Sciences & Humanities > Education
Social Sciences & Humanities > Linguistics and Language
Social Sciences & Humanities > Library and Information Sciences
Language:English
Date:December 2019
Deposited On:16 Dec 2019 16:27
Last Modified:29 Jul 2020 12:25
Publisher:Springer
ISSN:1574-020X
OA Status:Closed
Publisher DOI:https://doi.org/10.1007/s10579-019-09457-5
Related URLs:https://recherche.nebis.ch/permalink/f/1pa9ss3/ebi01_prod010849195 (Library Catalogue)

Download

Closed Access: Download allowed only for UZH members

Content: Published Version
Language: English
Filetype: PDF - Registered users only until 31 December 2020
Size: 1MB
View at publisher
Embargo till: 2020-12-31