Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Challenges in the alignment, management and exploitation of large and richly annotated multi-parallel corpora

Graën, Johannes; Clematide, Simon (2015). Challenges in the alignment, management and exploitation of large and richly annotated multi-parallel corpora. In: 3rd Workshop on the Challenges in the Management of Large Corpora, Lancaster, 20 July 2015. Institut für Deutsche Sprache, 15-20.

Abstract

The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as part-of-speech tagging, lemmatisation, chunking, and dependency parsing facilitate precise querying of linguistic properties and can be used to extend word alignment to sub-sentential groups. Such highly inter-connected data is stored in a relational database to allow for efficient retrieval and linguistic data mining, which may include the statistics-based selection of good example sentences. The varying information needs of contrastive linguists require a flexible linguistic query language for ad hoc searches. Such queries in the format of generalised treebank query languages will be automatically translated into SQL queries.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:20 July 2015
Deposited On:04 Aug 2015 09:25
Last Modified:27 Nov 2020 07:23
Publisher:Institut für Deutsche Sprache
Additional Information:URN: urn:nbn:de:bsz:mh39-38261
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://ids-pub.bsz-bw.de/files/3826/Graen_Clematide_Challenges_in_the_Alignment_management_and_exploitation_2015.pdf
Related URLs:http://corpora.ids-mannheim.de/cmlc.html
http://ids-pub.bsz-bw.de/files/3826/cmlc3-proceedings_2015.pdf
http://ids-pub.bsz-bw.de/frontdoor/index/index/docId/3826
Download PDF  'Challenges in the alignment, management and exploitation of large and richly annotated multi-parallel corpora'.
Preview
  • Content: Published Version
  • Licence: Creative Commons: Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Metadata Export

Statistics

Downloads

65 downloads since deposited on 04 Aug 2015
6 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications