Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain

Niklaus, Joel; Matoshi, Veton; Rani, Pooja; Galassi, Andrea; Stürmer, Matthias; Chalkidis, Ilias (2023). LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain. In: The 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Singapore, 6 December 2023 - 10 December 2023. Association for Computational Linguistics, 3016-3054.

Abstract

Lately, propelled by phenomenal advances around the transformer architecture, the legal NLP field has enjoyed spectacular growth. To measure progress, well-curated and challenging benchmarks are crucial. Previous efforts have produced numerous benchmarks for general NLP models, typically based on news or Wikipedia. However, these may not fit specific domains such as law, with its unique lexicons and intricate sentence structures. Even though there is a rising need to build NLP systems for languages other than English, many benchmarks are available only in English and no multilingual benchmark exists in the legal NLP field. We survey the legal NLP literature and select 11 datasets covering 24 languages, creating LEXTREME. To fairly compare models, we propose two aggregate scores, i.e., dataset aggregate score and language aggregate score. Our results show that even the best baseline only achieves modest results, and also ChatGPT struggles with many tasks. This indicates that LEXTREME remains a challenging task with ample room for improvement. To facilitate easy use for researchers and practitioners, we release LEXTREME on huggingface along with a public leaderboard and the necessary code to evaluate models. We also provide a public Weights and Biases project containing all runs for transparency.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Computer Science Applications
Physical Sciences > Information Systems
Social Sciences & Humanities > Language and Linguistics
Social Sciences & Humanities > Linguistics and Language
Language:English
Event End Date:10 December 2023
Deposited On:02 Dec 2024 14:03
Last Modified:03 Dec 2024 21:00
Publisher:Association for Computational Linguistics
Series Name:Findings of the Association for Computational Linguistics
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.18653/v1/2023.findings-emnlp.200
Download PDF  'LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

1 download since deposited on 02 Dec 2024
1 download since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications