Publication:

Modeling Orthographic Variation in Occitan’s Dialects

Date

Date

Date
2024
Conference or Workshop Item
Published version
cris.lastimport.scopus2025-06-26T03:39:52Z
dc.contributor.institutionUniversity of Zurich
dc.date.accessioned2024-08-21T12:31:06Z
dc.date.available2024-08-21T12:31:06Z
dc.date.issued2024-06-20
dc.description.abstract

Effectively normalizing spellings in textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model’s representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects.Intrinsic evaluations of the model’s embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.

dc.identifier.doi10.18653/v1/2024.vardial-1.6
dc.identifier.scopus2-s2.0-105000821890
dc.identifier.urihttps://www.zora.uzh.ch/handle/20.500.14742/220404
dc.language.isoeng
dc.subject.ddc410 Linguistics
dc.subject.ddc000 Computer science, knowledge & systems
dc.title

Modeling Orthographic Variation in Occitan’s Dialects

dc.typeconference_item
dcterms.accessRightsinfo:eu-repo/semantics/openAccess
dcterms.bibliographicCitation.pageend88
dcterms.bibliographicCitation.pagestart78
dcterms.bibliographicCitation.urlhttps://aclanthology.org/2024.vardial-1.6/
dspace.entity.typePublicationen
oairecerif.event.countryMexico
oairecerif.event.endDate2024-06-20
oairecerif.event.placeMexico City
oairecerif.event.startDate2024-06-20
uzh.contributor.authorHopton, Zachary
uzh.contributor.authorAepli, Noëmi
uzh.contributor.correspondenceYes
uzh.contributor.correspondenceNo
uzh.document.availabilitypublished_version
uzh.eprint.datestamp2024-08-21 12:31:06
uzh.eprint.lastmod2024-09-01 20:55:15
uzh.eprint.statusChange2024-08-21 12:31:06
uzh.event.presentationTypepaper
uzh.event.titleEleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
uzh.event.typeworkshop
uzh.harvester.ethYes
uzh.harvester.nbNo
uzh.identifier.doi10.5167/uzh-261057
uzh.oastatus.unpaywallgreen
uzh.oastatus.zoraGreen
uzh.publication.citationHopton, Zachary; Aepli, Noëmi (2024). Modeling Orthographic Variation in Occitan’s Dialects. In: Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), Mexico City, Mexico, 20 June 2024, 78-88.
uzh.publication.freeAccessAtUNSPECIFIED
uzh.publication.originalworkoriginal
uzh.publication.publishedStatusfinal
uzh.scopus.impact0
uzh.workflow.doajuzh.workflow.doaj.false
uzh.workflow.eprintid261057
uzh.workflow.fulltextStatuspublic
uzh.workflow.revisions13
uzh.workflow.rightsCheckkeininfo
uzh.workflow.statusarchive
Files

Original bundle

Name:
2024.vardial_1.6.pdf
Size:
1.41 MB
Format:
Adobe Portable Document Format
Publication available in collections: