Header

UZH-Logo

Maintenance Infos

How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?


Hu, Xuke; Sun, Yeran; Kersten, Jens; Zhou, Zhiyong; Klan, Friederike; Fan, Hongchao (2023). How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? International Journal of Applied Earth Observation and Geoinformation, 117:103191.

Abstract

Natural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambiguation, which can be approached by toponym resolution and entity linking. Recently, many novel approaches, especially deep learning-based, have been proposed, such as CamCoder, GENRE, and BLINK. However, these approaches were not compared on the same and large datasets. Moreover, there is still a need and space to improve their robustness and generalizability further. To mitigate the two research gaps, in this paper, we propose a spatial clustering-based voting approach combining several individual approaches and compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 toponyms. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. It also drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways. The detailed evaluation results can inform future methodological developments and guide the selection of proper approaches based on application needs.

Abstract

Natural language texts, such as tweets and news, contain a vast amount of geospatial information, which can be extracted by first recognizing toponyms in texts (toponym recognition) and then identifying their geospatial representations (toponym disambiguation). This paper focuses on toponym disambiguation, which can be approached by toponym resolution and entity linking. Recently, many novel approaches, especially deep learning-based, have been proposed, such as CamCoder, GENRE, and BLINK. However, these approaches were not compared on the same and large datasets. Moreover, there is still a need and space to improve their robustness and generalizability further. To mitigate the two research gaps, in this paper, we propose a spatial clustering-based voting approach combining several individual approaches and compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 toponyms. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. It also drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways. The detailed evaluation results can inform future methodological developments and guide the selection of proper approaches based on application needs.

Statistics

Citations

Dimensions.ai Metrics
1 citation in Web of Science®
13 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

36 downloads since deposited on 24 Mar 2023
33 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Geography
Dewey Decimal Classification:910 Geography & travel
Scopus Subject Areas:Physical Sciences > Global and Planetary Change
Physical Sciences > Earth-Surface Processes
Physical Sciences > Computers in Earth Sciences
Physical Sciences > Management, Monitoring, Policy and Law
Uncontrolled Keywords:Management, Monitoring, Policy and Law, Computers in Earth Sciences, Earth-Surface Processes, Global and Planetary Change
Language:English
Date:1 March 2023
Deposited On:24 Mar 2023 11:05
Last Modified:25 Jun 2024 03:35
Publisher:Elsevier
ISSN:0303-2434
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1016/j.jag.2023.103191
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)