Header

UZH-Logo

Maintenance Infos

Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations


Clematide, Simon; Frick, Karina; Aepli, Noëmi; Goldman, Jean-Philippe (2016). Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS) Bochum, Germany September 19–21, 2016, Bochum, 19 September 2016 - 21 September 2016. Universitätsverlag Ruhr-Universität Bochum, 62-67.

Abstract

In this paper, we systematically analyze writing variations of Swiss German in two existing corpora with standard German glosses, a corpus of 10,000 short text messages and a corpus of transcribed oral history recordings (90,000 tokens). We show that neither resource is sufficient for assessing factors in writing variations of users and describe a data collection project involving a citizen science community for solving this problem. Laymen will independently and redundantly transcribe 1,200 short samples (15-20 seconds) of audio material in Swiss German according to their own best practice.

Abstract

In this paper, we systematically analyze writing variations of Swiss German in two existing corpora with standard German glosses, a corpus of 10,000 short text messages and a corpus of transcribed oral history recordings (90,000 tokens). We show that neither resource is sufficient for assessing factors in writing variations of users and describe a data collection project involving a citizen science community for solving this problem. Laymen will independently and redundantly transcribe 1,200 short samples (15-20 seconds) of audio material in Swiss German according to their own best practice.

Statistics

Downloads

74 downloads since deposited on 21 Dec 2016
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Speech), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
430 German & related languages
Uncontrolled Keywords:Citizen Science, Swiss German, Non-Standard Orthography
Language:English
Event End Date:21 September 2016
Deposited On:21 Dec 2016 16:51
Last Modified:15 Sep 2021 13:55
Publisher:Universitätsverlag Ruhr-Universität Bochum
Series Name:Bochumer Linguistische Arbeitsberichte
ISSN:2190-0949
Funders:SNF CRAGP1_164811/1
OA Status:Green
Official URL:https://www.linguistics.rub.de/bla/016-konvens2016.pdf
Project Information:
  • : FunderSNSF
  • : Grant ID
  • : Project TitleSNF CRAGP1_164811/1
  • Content: Published Version
  • Publisher License