Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

A Tulu Resource for Machine Translation

Narayanan, Manu; Aepli, Noëmi (2024). A Tulu Resource for Machine Translation. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, 20 May 2024 - 25 May 2024, 1756-1767.

Abstract

We present the first parallel dataset for English–Tulu translation. Tulu, classified within the South Dravidian linguistic family branch, is predominantly spoken by approximately 2.5 million individuals in southwestern India. Our dataset is constructed by integrating human translations into the multilingual machine translation resource FLORES-200. Furthermore, we use this dataset for evaluation purposes in developing our English–Tulu machine translation model. For the model’s training, we leverage resources available for related South Dravidian languages. We adopt a transfer learning approach that exploits similarities between high-resource and low-resource languages. This method enables the training of a machine translation system even in the absence of parallel data between the source and target language, thereby overcoming a significant obstacle in machine translation development for low-resource languages. Our English–Tulu system, trained without using parallel English–Tulu data, outperforms Google Translate by 19 BLEU points (in September 2023). The dataset and code are available here: https://github.com/manunarayanan/Tulu-NMT.

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Computer Science Applications
Language:English
Event End Date:25 May 2024
Deposited On:21 Aug 2024 12:29
Last Modified:01 Sep 2024 21:02
OA Status:Green
Official URL:https://aclanthology.org/2024.lrec-main.155/
Download PDF  'A Tulu Resource for Machine Translation'.
Preview
  • Content: Published Version
  • Language: English

Metadata Export

Statistics

Downloads

1 download since deposited on 21 Aug 2024
1 download since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications