Header

UZH-Logo

Maintenance Infos

CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects


Clematide, Simon; Makarov, Peter (2017). CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects. In: Fourth Workshop on NLP for Similar Languages, Varieties and Dialects, Valencia, 3 April 2017. Association for Computational Linguistics, 170-177.

Abstract

Our submissions for the GDI 2017 Shared Task are the results from three different types of classifiers: Naive Bayes, Conditional Random Fields (CRF), and Support Vector Machine (SVM). Our CRF-based run achieves a weighted F1 score of 65% (third rank) being beaten by the best system by 0.9%. Measured by classification accuracy, our ensemble run (Naive Bayes, CRF, SVM) reaches 67% (second rank) being 1% lower than the best system. We also describe our experiments with Recurrent Neural Network (RNN) architectures. Since they performed worse than our non-neural approaches we did not include them in the submission.

Abstract

Our submissions for the GDI 2017 Shared Task are the results from three different types of classifiers: Naive Bayes, Conditional Random Fields (CRF), and Support Vector Machine (SVM). Our CRF-based run achieves a weighted F1 score of 65% (third rank) being beaten by the best system by 0.9%. Measured by classification accuracy, our ensemble run (Naive Bayes, CRF, SVM) reaches 67% (second rank) being 1% lower than the best system. We also describe our experiments with Recurrent Neural Network (RNN) architectures. Since they performed worse than our non-neural approaches we did not include them in the submission.

Statistics

Citations

Downloads

87 downloads since deposited on 20 Feb 2018
4 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Uncontrolled Keywords:dialect identification, machine learning,
Language:English
Event End Date:3 April 2017
Deposited On:20 Feb 2018 16:58
Last Modified:10 Feb 2022 08:16
Publisher:Association for Computational Linguistics
Funders:European Research Council Grant No. 338875
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.aclweb.org/anthology/W17-1221
  • Content: Published Version
  • Language: English