Geographic information retrieval (GIR) is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spa- tial component linked by a spatial relationship. However, evaluation initiatives have of- ten failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial in- dexes) at retrieving documents from a corpus describing mountaineering expeditions, cen- tred around fine grained toponyms. To allow evaluation, we use user generated content (UGC) in the form of metadata associated with individual articles to build a test collec- tion of queries and judgments. The test collection allowed us to demonstrate that a GIR- based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR.