Publication: Modeling Orthographic Variation in Occitan’s Dialects
Modeling Orthographic Variation in Occitan’s Dialects
Date
Date
Date
| cris.lastimport.scopus | 2025-06-26T03:39:52Z | |
| dc.contributor.institution | University of Zurich | |
| dc.date.accessioned | 2024-08-21T12:31:06Z | |
| dc.date.available | 2024-08-21T12:31:06Z | |
| dc.date.issued | 2024-06-20 | |
| dc.description.abstract | Effectively normalizing spellings in textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model’s representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects.Intrinsic evaluations of the model’s embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing. | |
| dc.identifier.doi | 10.18653/v1/2024.vardial-1.6 | |
| dc.identifier.scopus | 2-s2.0-105000821890 | |
| dc.identifier.uri | https://www.zora.uzh.ch/handle/20.500.14742/220404 | |
| dc.language.iso | eng | |
| dc.subject.ddc | 410 Linguistics | |
| dc.subject.ddc | 000 Computer science, knowledge & systems | |
| dc.title | Modeling Orthographic Variation in Occitan’s Dialects | |
| dc.type | conference_item | |
| dcterms.accessRights | info:eu-repo/semantics/openAccess | |
| dcterms.bibliographicCitation.pageend | 88 | |
| dcterms.bibliographicCitation.pagestart | 78 | |
| dcterms.bibliographicCitation.url | https://aclanthology.org/2024.vardial-1.6/ | |
| dspace.entity.type | Publication | en |
| oairecerif.event.country | Mexico | |
| oairecerif.event.endDate | 2024-06-20 | |
| oairecerif.event.place | Mexico City | |
| oairecerif.event.startDate | 2024-06-20 | |
| uzh.contributor.author | Hopton, Zachary | |
| uzh.contributor.author | Aepli, Noëmi | |
| uzh.contributor.correspondence | Yes | |
| uzh.contributor.correspondence | No | |
| uzh.document.availability | published_version | |
| uzh.eprint.datestamp | 2024-08-21 12:31:06 | |
| uzh.eprint.lastmod | 2024-09-01 20:55:15 | |
| uzh.eprint.statusChange | 2024-08-21 12:31:06 | |
| uzh.event.presentationType | paper | |
| uzh.event.title | Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024) | |
| uzh.event.type | workshop | |
| uzh.harvester.eth | Yes | |
| uzh.harvester.nb | No | |
| uzh.identifier.doi | 10.5167/uzh-261057 | |
| uzh.oastatus.unpaywall | green | |
| uzh.oastatus.zora | Green | |
| uzh.publication.citation | Hopton, Zachary; Aepli, Noëmi (2024). Modeling Orthographic Variation in Occitan’s Dialects. In: Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), Mexico City, Mexico, 20 June 2024, 78-88. | |
| uzh.publication.freeAccessAt | UNSPECIFIED | |
| uzh.publication.originalwork | original | |
| uzh.publication.publishedStatus | final | |
| uzh.scopus.impact | 0 | |
| uzh.workflow.doaj | uzh.workflow.doaj.false | |
| uzh.workflow.eprintid | 261057 | |
| uzh.workflow.fulltextStatus | public | |
| uzh.workflow.revisions | 13 | |
| uzh.workflow.rightsCheck | keininfo | |
| uzh.workflow.status | archive | |
| Files | ||
| Publication available in collections: |