Header

UZH-Logo

Maintenance Infos

Drawing areal information from a corpus of noisy dialect data


Glaser, Elvira; Lameli, Alfred; Stoeckle, Philipp (2020). Drawing areal information from a corpus of noisy dialect data. Journal of Linguistic Geography, 8(1):1-18.

Abstract

This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.

Abstract

This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

3 downloads since deposited on 29 Jan 2021
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of German Studies
08 Research Priority Programs > Language and Space
Dewey Decimal Classification:430 German & related languages
Language:English
Date:2020
Deposited On:29 Jan 2021 10:47
Last Modified:29 Jan 2021 12:22
Publisher:Cambridge University Press
ISSN:2049-7547
OA Status:Closed
Publisher DOI:https://doi.org/10.1017/jlg.2020.4
Related URLs:https://www.cambridge.org/core/journals/journal-of-linguistic-geography/article/drawing-areal-information-from-a-corpus-of-noisy-dialect-data/08FEE910207B77B4AE9AD969A27A0D1D (Publisher)
https://uzb.swisscovery.slsp.ch/permalink/41SLSP_UZB/rloemb/alma99116751559605508 (Library Catalogue)

Download

Closed Access: Download allowed only for UZH members

Content: Published Version
Language: English
Filetype: PDF - Registered users only
Size: 5MB
View at publisher