This paper discusses experiments that were performed to understand the geographic and linguistic coverage of web resources focusing on tourism-related themes in Switzerland. The research was prompted by the observation that studies in geographic information retrieval (GIR) and volunteered geographic information (VGI) commonly assume web coverage to be homogenous across geographic space, themes, and languages. There are, however, strong hints that this assumption is unfounded (Pasley et al. 2008).
The goal of studying the geographic web coverage is one of the preliminary steps in generating (geographic) data from the web that can be used as valid information. An idea on how well certain areas are geographically covered by information available on the web, their frequency and patterns that emerge from this data collection help in the decision of preselecting web data for further investigation. For this experiment the language is also considered as coverage varies greatly on the tongue of the place.
Ad hoc tourism information is readily available on the web in the form of pages that contain news, lists, catalogue, reviews, blogs and multimedia content. All this provides a vast playground for tourism as a use case for generating geographic information from the web.
The key questions driving this research are: 1) What is the geographic distribution of web coverage for tourism-related themes? 2) How does language affect web coverage?