Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

The LiRI Corpus Platform

Graën, Johannes; Schaber, Jonathan; McDonald, Daniel; Mustač, Igor; Rajović, Nikolina; Schneider, Gerold; Vuković, Teodora; Zehr, Jeremy; Bubenhofer, Noah (2024). The LiRI Corpus Platform. In: CLARIN Annual Conference 2023, Leuven, Belgium, 16 October 2023 - 18 October 2023. Linköpings universitet Electronic Press, 62-75.

Abstract

We present the LiRI Corpus Platform (LCP), a software system and infrastructure for querying a vast array of corpora of different kinds. It heavily relies on the PostgreSQL relational database management system, employing state-of-the-art data representation and indexing techniques, which lead to significant performance gains when querying, even for structurally complex queries involving nested logical operations and quantifiers. In this work, we describe the requirements that led to the development of this novel system, discuss methods from corpus linguistics and beyond that we considered key for such a system, and provide details on a number of technological features that we take advantage of. Our platform also comes with its own query language tailored both to the requirements in terms of information need and our philosophy of how to define corpora in an abstract way.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Linguistic Research Infrastructure (LiRI)
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:410 Linguistics
000 Computer science, knowledge & systems
Language:English
Event End Date:18 October 2023
Deposited On:18 Jul 2024 11:33
Last Modified:28 Jan 2025 06:06
Publisher:Linköpings universitet Electronic Press
Series Name:Linköping Electronic Conference Proceedings
ISSN:1650-3686
ISBN:978-91-8075-740-9
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.3384/ecp210010
Related URLs:https://www.zora.uzh.ch/id/eprint/257131/
Download PDF  'The LiRI Corpus Platform'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

24 downloads since deposited on 18 Jul 2024
24 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications