Header

UZH-Logo

Maintenance Infos

A quantitative analysis of global gazetteers: Patterns of coverage for common feature types


Acheson, Elise; De Sabbata, Stefano; Purves, Ross S (2017). A quantitative analysis of global gazetteers: Patterns of coverage for common feature types. Computers, Environment and Urban Systems, 64:309-320.

Abstract

Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities.

Abstract

Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities.

Statistics

Citations

Dimensions.ai Metrics
2 citations in Web of Science®
2 citations in Scopus®
4 citations in Microsoft Academic
Google Scholar™

Altmetrics

Downloads

3 downloads since deposited on 20 Dec 2017
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Geography
Dewey Decimal Classification:910 Geography & travel
Uncontrolled Keywords:Ecological Modelling, Geography, Planning and Development, General Environmental Science, Urban Studies
Language:English
Date:2017
Deposited On:20 Dec 2017 16:37
Last Modified:16 Nov 2018 10:06
Publisher:Elsevier
ISSN:0198-9715
OA Status:Closed
Publisher DOI:https://doi.org/10.1016/j.compenvurbsys.2017.03.007

Download