Publication:

The LiRI Corpus Platform

Date

Date

Date
2024
Conference or Workshop Item
Published version
cris.virtual.orcidhttps://orcid.org/0000-0002-5780-5665
cris.virtual.orcidhttps://orcid.org/0000-0002-0459-5086
cris.virtual.orcidhttps://orcid.org/0000-0002-2134-2013
cris.virtualsource.orcid56acab60-8e01-4c2d-a26e-35a806e6b999
cris.virtualsource.orcid21fff778-6a26-4132-abdc-3ceff710ddb2
cris.virtualsource.orcid28c67ff6-3e63-4ddb-a9a7-1b9b7e569a74
dc.contributor.institutionUniversity of Zurich
dc.date.accessioned2024-07-18T11:33:15Z
dc.date.available2024-07-18T11:33:15Z
dc.date.issued2024-07-09
dc.description.abstract

We present the LiRI Corpus Platform (LCP), a software system and infrastructure for querying a vast array of corpora of different kinds. It heavily relies on the PostgreSQL relational database management system, employing state-of-the-art data representation and indexing techniques, which lead to significant performance gains when querying, even for structurally complex queries involving nested logical operations and quantifiers. In this work, we describe the requirements that led to the development of this novel system, discuss methods from corpus linguistics and beyond that we considered key for such a system, and provide details on a number of technological features that we take advantage of. Our platform also comes with its own query language tailored both to the requirements in terms of information need and our philosophy of how to define corpora in an abstract way.

dc.identifier.doi10.3384/ecp210010
dc.identifier.isbn978-91-8075-740-9
dc.identifier.issn1650-3740
dc.identifier.urihttps://www.zora.uzh.ch/handle/20.500.14742/220423
dc.language.isoeng
dc.subject.ddc410 Linguistics
dc.subject.ddc000 Computer science, knowledge & systems
dc.title

The LiRI Corpus Platform

dc.typeconference_item
dcterms.accessRightsinfo:eu-repo/semantics/openAccess
dcterms.bibliographicCitation.journaltitleLinköping Electronic Conference Proceedings
dcterms.bibliographicCitation.originalpublishernameLinköping University Electronic Press
dcterms.bibliographicCitation.pageend75
dcterms.bibliographicCitation.pagestart62
dspace.entity.typePublicationen
oairecerif.event.countryBelgium
oairecerif.event.endDate2023-10-18
oairecerif.event.placeLeuven
oairecerif.event.startDate2023-10-16
uzh.contributor.authorGraën, Johannes
uzh.contributor.authorSchaber, Jonathan
uzh.contributor.authorMcDonald, Daniel
uzh.contributor.authorMustač, Igor
uzh.contributor.authorRajović, Nikolina
uzh.contributor.authorSchneider, Gerold
uzh.contributor.authorVuković, Teodora
uzh.contributor.authorZehr, Jeremy
uzh.contributor.authorBubenhofer, Noah
uzh.contributor.correspondenceYes
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.document.availabilitypublished_version
uzh.eprint.datestamp2024-07-18 11:33:15
uzh.eprint.lastmod2025-03-26 13:16:43
uzh.eprint.statusChange2024-07-18 11:33:15
uzh.event.presentationTypepaper
uzh.event.titleCLARIN Annual Conference 2023
uzh.event.typeconference
uzh.harvester.ethYes
uzh.harvester.nbNo
uzh.identifier.doi10.5167/uzh-261076
uzh.jdb.eprintsId36465
uzh.oastatus.unpaywallclosed
uzh.oastatus.zoraGold
uzh.publication.citationGraën, Johannes; Schaber, Jonathan; McDonald, Daniel; Mustač, Igor; Rajović, Nikolina; Schneider, Gerold; Vuković, Teodora; Zehr, Jeremy; Bubenhofer, Noah (2024). The LiRI Corpus Platform. In: CLARIN Annual Conference 2023, Leuven, Belgium, 16 October 2023 - 18 October 2023. Linköping University Electronic Press, 62-75.
uzh.publication.freeAccessAtdoi
uzh.publication.originalworkoriginal
uzh.publication.publishedStatusfinal
uzh.publication.seriesTitleLinköping Electronic Conference Proceedings
uzh.relatedUrl.urlhttps://www.zora.uzh.ch/id/eprint/257131/
uzh.workflow.doajuzh.workflow.doaj.false
uzh.workflow.eprintid261076
uzh.workflow.fulltextStatuspublic
uzh.workflow.revisions28
uzh.workflow.rightsCheckkeininfo
uzh.workflow.sourceCrossref:10.3384/ecp210010
uzh.workflow.statusarchive
Files

Original bundle

Name:
CLARIN2023_paper_10_Graen_86.pdf
Size:
869.27 KB
Format:
Adobe Portable Document Format
Publication available in collections: