Header

UZH-Logo

Maintenance Infos

Geotagging a diachronic corpus of alpine texts: comparing distinct approaches to toponym recognition


Kew, Tannon; Shaitarova, Anastassia; Meraner, Isabel; Clematide, Simon; Goldzycher, Janis; Volk, Martin (2019). Geotagging a diachronic corpus of alpine texts: comparing distinct approaches to toponym recognition. In: RANLP 2019, Workshop on Language technology for digital historical archives with a special focus on Central-, (South-)Eastern Europe, Middle East and North Africa, Varna, Bulgaria, 5 September 2019 - 5 September 2019.

Abstract

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries. In this paper, we describe two distinct approaches to geotagging a variety of fine-grained toponyms in a diachronic corpus of alpine texts.
By applying a traditional gazetteer-based approach, aided by a few simple heuristics, we attain strong high-precision annotations.
Using the output of this earlier system, we adopt a state-of-the-art neural approach in order to facilitate the detection of new toponyms on the basis of context.
Additionally, we present the results of preliminary experiments on integrating a small amount of crowdsourced annotations to improve overall performance of toponym recognition in our heritage corpus.

Abstract

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries. In this paper, we describe two distinct approaches to geotagging a variety of fine-grained toponyms in a diachronic corpus of alpine texts.
By applying a traditional gazetteer-based approach, aided by a few simple heuristics, we attain strong high-precision annotations.
Using the output of this earlier system, we adopt a state-of-the-art neural approach in order to facilitate the detection of new toponyms on the basis of context.
Additionally, we present the results of preliminary experiments on integrating a small amount of crowdsourced annotations to improve overall performance of toponym recognition in our heritage corpus.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

27 downloads since deposited on 21 Nov 2019
27 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, further contribution
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:5 September 2019
Deposited On:21 Nov 2019 14:29
Last Modified:29 Jul 2020 11:49
OA Status:Hybrid
Publisher DOI:https://doi.org/10.26615/978-954-452-059-5_003

Download

Hybrid Open Access

Download PDF  'Geotagging a diachronic corpus of alpine texts: comparing distinct approaches to toponym recognition'.
Preview
Content: Accepted Version
Filetype: PDF
Size: 4MB
View at publisher