Header

UZH-Logo

Maintenance Infos

Location Reference Recognition from Texts: A Survey and Comparison


Hu, Xuke; Zhou, Zhiyong; Li, Hao; Hu, Yingjie; Gu, Fuqiang; Kersten, Jens; Fan, Hongchao; Klan, Friederike (2024). Location Reference Recognition from Texts: A Survey and Comparison. ACM Computing Surveys, 56(5):1-37.

Abstract

A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs.

Abstract

A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs.

Statistics

Citations

Dimensions.ai Metrics
2 citations in Web of Science®
6 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

6 downloads since deposited on 01 Dec 2023
6 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Geography
Dewey Decimal Classification:910 Geography & travel
Uncontrolled Keywords:General Computer Science, Theoretical Computer Science
Language:English
Date:31 May 2024
Deposited On:01 Dec 2023 13:12
Last Modified:29 Jun 2024 01:40
Publisher:ACM Digital library
ISSN:0360-0300
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1145/3625819
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)