Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Modeling Orthographic Variation in Occitan’s Dialects

Hopton, Zachary; Aepli, Noëmi (2024). Modeling Orthographic Variation in Occitan’s Dialects. In: Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), Mexico City, Mexico, 20 June 2024, 78-88.

Abstract

Effectively normalizing spellings in textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model’s representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects.Intrinsic evaluations of the model’s embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:410 Linguistics
000 Computer science, knowledge & systems
Language:English
Event End Date:20 June 2024
Deposited On:21 Aug 2024 12:31
Last Modified:01 Sep 2024 20:55
OA Status:Green
Publisher DOI:https://doi.org/10.18653/v1/2024.vardial-1.6
Official URL:https://aclanthology.org/2024.vardial-1.6/
Download PDF  'Modeling Orthographic Variation in Occitan’s Dialects'.
Preview
  • Content: Published Version
  • Language: English

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

1 download since deposited on 21 Aug 2024
1 download since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications