Header

UZH-Logo

Maintenance Infos

CLUZH at SIGMORPHON 2022 Shared Tasks on Morpheme Segmentation and Inflection Generation


Wehrli, Silvan; Clematide, Simon; Makarov, Peter (2022). CLUZH at SIGMORPHON 2022 Shared Tasks on Morpheme Segmentation and Inflection Generation. In: Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Seattle, Washington, 14 July 2022. Association for Computational Linguistics, 212-219.

Abstract

This paper describes the submissions of the team of the Department of Computational Linguistics, University of Zurich, to the SIGMORPHON 2022 Shared Tasks on Morpheme Segmentation and Inflection Generation. Our submissions use a character-level neural transducer that operates over traditional edit actions. While this model has been found particularly wellsuited for low-resource settings, using it with large data quantities has been difficult. Existing implementations could not fully profit from GPU acceleration and did not efficiently implement mini-batch training, which could be tricky for a transition-based system. For this year’s submission, we have ported the neural transducer to PyTorch and implemented true mini-batch training. This has allowed us to successfully scale the approach to large data quantities and conduct extensive experimentation. We report competitive results for morpheme segmentation (including sharing first place in part 2 of the challenge). We also demonstrate that reducing sentence-level morpheme segmentation to a word-level problem is a simple yet effective strategy. Additionally, we report strong results in inflection generation (the overall best result for large training sets in part 1, the best results in low-resource learning trajectories in part 2). Our code is publicly available.

Abstract

This paper describes the submissions of the team of the Department of Computational Linguistics, University of Zurich, to the SIGMORPHON 2022 Shared Tasks on Morpheme Segmentation and Inflection Generation. Our submissions use a character-level neural transducer that operates over traditional edit actions. While this model has been found particularly wellsuited for low-resource settings, using it with large data quantities has been difficult. Existing implementations could not fully profit from GPU acceleration and did not efficiently implement mini-batch training, which could be tricky for a transition-based system. For this year’s submission, we have ported the neural transducer to PyTorch and implemented true mini-batch training. This has allowed us to successfully scale the approach to large data quantities and conduct extensive experimentation. We report competitive results for morpheme segmentation (including sharing first place in part 2 of the challenge). We also demonstrate that reducing sentence-level morpheme segmentation to a word-level problem is a simple yet effective strategy. Additionally, we report strong results in inflection generation (the overall best result for large training sets in part 1, the best results in low-resource learning trajectories in part 2). Our code is publicly available.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

35 downloads since deposited on 18 Feb 2023
29 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Artificial Intelligence
Physical Sciences > Human-Computer Interaction
Social Sciences & Humanities > Linguistics and Language
Language:English
Event End Date:14 July 2022
Deposited On:18 Feb 2023 16:10
Last Modified:19 Feb 2023 21:00
Publisher:Association for Computational Linguistics
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.18653/v1/2022.sigmorphon-1.21
Official URL:https://aclanthology.org/2022.sigmorphon-1.21/
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)