Header

UZH-Logo

Maintenance Infos

Geotagging a diachronic corpus of alpine texts: comparing distinct approaches to toponym recognition


Kew, Tannon; Shaitarova, Anastassia; Meraner, Isabel; Clematide, Simon; Goldzycher, Janis; Volk, Martin (2019). Geotagging a diachronic corpus of alpine texts: comparing distinct approaches to toponym recognition. In: RANLP 2019, Workshop on Language technology for digital historical archives with a special focus on Central-, (South-)Eastern Europe, Middle East and North Africa, Varna, Bulgaria, 5 September 2019. RANLP, 11-18.

Abstract

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries. In this paper, we describe two distinct approaches to geotagging a variety of fine-grained toponyms in a diachronic corpus of alpine texts.
By applying a traditional gazetteer-based approach, aided by a few simple heuristics, we attain strong high-precision annotations.
Using the output of this earlier system, we adopt a state-of-the-art neural approach in order to facilitate the detection of new toponyms on the basis of context.
Additionally, we present the results of preliminary experiments on integrating a small amount of crowdsourced annotations to improve overall performance of toponym recognition in our heritage corpus.

Abstract

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries. In this paper, we describe two distinct approaches to geotagging a variety of fine-grained toponyms in a diachronic corpus of alpine texts.
By applying a traditional gazetteer-based approach, aided by a few simple heuristics, we attain strong high-precision annotations.
Using the output of this earlier system, we adopt a state-of-the-art neural approach in order to facilitate the detection of new toponyms on the basis of context.
Additionally, we present the results of preliminary experiments on integrating a small amount of crowdsourced annotations to improve overall performance of toponym recognition in our heritage corpus.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

67 downloads since deposited on 21 Nov 2019
14 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, further contribution
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:5 September 2019
Deposited On:21 Nov 2019 14:29
Last Modified:27 May 2022 12:42
Publisher:RANLP
ISBN:978-954-452-059-5
OA Status:Hybrid
Publisher DOI:https://doi.org/10.26615/978-954-452-059-5_003
  • Content: Accepted Version