Navigation auf zora.uzh.ch

Search

ZORA (Zurich Open Repository and Archive)

Representing variation in a spoken corpus of an endangered dialect: the case of Torlak

Vuković, Teodora (2021). Representing variation in a spoken corpus of an endangered dialect: the case of Torlak. Language Resources and Evaluation, 55(3):731-756.

Abstract

The paper presents a spoken corpus of the endangered Torlak dialect from the Timok area of Southeast Serbia. This dialect expresses a great deal of variation in the use of non-standard features under the influence of standard Serbian (SSr). Accounting for this variation, a specific methodology has been selected for collection, sampling, transcription and annotation. Between 2015 and 2017, semi-structured interviews were conducted in the field eliciting spontaneous speech in the form of long narratives about traditional culture and history. The corpus comprises 500,697 tokens of semi-orthographic transcripts representing 80 h of recording from locations evenly distributed across the Timok area of the Torlak dialect zone, thus enabling a spatial contrastive analysis. The majority of speakers in the corpus are older people whose language represents the highly non-standard variety. In order to allow for analysis of language change under the influence of SSr, the corpus includes a number of younger people whose speech is closer to SSr. Tools for automatic PoS annotation and lemmatization that were lacking were developed based on the existing resources for SSr. For tagger training, a dialect sample of 27,000 manually verified tokens was merged with an existing training set for SSr.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Slavonic Studies
Dewey Decimal Classification:490 Other languages
410 Linguistics
Scopus Subject Areas:Social Sciences & Humanities > Language and Linguistics
Social Sciences & Humanities > Education
Social Sciences & Humanities > Linguistics and Language
Social Sciences & Humanities > Library and Information Sciences
Uncontrolled Keywords:Linguistics and Language, Education, Library and Information Sciences, Language and Linguistics
Language:English
Date:1 September 2021
Deposited On:15 Jan 2021 10:49
Last Modified:24 Aug 2024 01:40
Publisher:Springer
ISSN:1574-020X
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1007/s10579-020-09522-4
Official URL:https://link.springer.com/article/10.1007/s10579-020-09522-4
Related URLs:https://www.clarin.si/repository/xmlui/handle/11356/1281 (Research Data)
Project Information:
  • Funder: SNSF
  • Grant ID: IZRPZ0_177557
  • Project Title: (Dis-)entangling traditions on the Central Balkans: Performance and perception (TraCeBa)
  • Funder: FP7
  • Grant ID: 200307
  • Project Title:
Download PDF  'Representing variation in a spoken corpus of an endangered dialect: the case of Torlak'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
5 citations in Web of Science®
6 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

74 downloads since deposited on 15 Jan 2021
2 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications