Navigation auf


ZORA (Zurich Open Repository and Archive)

Challenges in building a multilingual alpine heritage corpus

Volk, Martin; Bubenhofer, Noah; Althaus, Adrian; Bangerter, Maya; Furrer, Lenz; Ruef, Beni (2010). Challenges in building a multilingual alpine heritage corpus. In: seventh international conference on Language Resources and Evaluation (LREC), Malta, 19 May 2010 - 21 May 2010.


This paper describes our efforts to build a multilingual heritage corpus of alpine texts. Currently we digitize the yearbooks of the Swiss Alpine Club which contain articles in French, German, Italian and Romansch. Articles comprise mountaineering reports from all corners of the earth, but also scientific topics such as topography, geology or glacierology as well as occasional poetry and lyrics. We have already scanned close to 70,000 pages which has resulted in a corpus of 25 million words, 10% of which is a parallel French-German corpus. We have solved a number of challenges in automatic language identification and text structure recognition. Our next goal is to identify the great variety of toponyms (e.g. names of mountains and valleys, glaciers and rivers, trails and cabins) in this corpus, and we sketch how a large gazetteer of Swiss topographical names can be exploited for this purpose. Despite the size of the resource, exact matching leads to a low recall because of spelling variations, language mixtures and partial repetitions.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:410 Linguistics
000 Computer science, knowledge & systems
Scopus Subject Areas:Social Sciences & Humanities > Education
Social Sciences & Humanities > Library and Information Sciences
Social Sciences & Humanities > Linguistics and Language
Social Sciences & Humanities > Language and Linguistics
Event End Date:21 May 2010
Deposited On:31 May 2010 08:27
Last Modified:28 Jun 2022 09:38
OA Status:Green

Metadata Export



1 citation in Web of Science®
21 citations in Scopus®
Google Scholar™


400 downloads since deposited on 31 May 2010
23 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications