Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

Winata, Genta Indra; Aji, Alham Fikri; Cahyawijaya, Samuel; Mahendra, Rahmad; Koto, Fajri; Romadhony, Ade; Kurniawan, Kemal; Moeljadi, David; Prasojo, Radityo Eko; Fung, Pascale; Baldwin, Timothy; Lau, Jey Han; Sennrich, Rico; Ruder, Sebastian (2023). NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2 May 2023 - 6 May 2023. Association for Computational Linguistics, 815-834.

Abstract

Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes sentiment and machine translation datasets, and bilingual lexicons. We provide extensive analyses and describe challenges for creating such resources. We hope this work can spark NLP research on Indonesian and other underrepresented languages.

Additional indexing

Item Type:Conference or Workshop Item (Paper), original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Software
Social Sciences & Humanities > Linguistics and Language
Language:English
Event End Date:6 May 2023
Deposited On:28 Jul 2023 10:54
Last Modified:20 Jun 2024 09:55
Publisher:Association for Computational Linguistics
OA Status:Hybrid
Publisher DOI:https://doi.org/10.18653/v1/2023.eacl-main.57
Download PDF  'NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

7 downloads since deposited on 28 Jul 2023
7 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications