Header

UZH-Logo

Maintenance Infos

Visualization and interactive exploration of spatio-temporal and thematic information in digital text archives


Bruggmann, André. Visualization and interactive exploration of spatio-temporal and thematic information in digital text archives. 2017, University of Zurich, Faculty of Science.

Abstract

While rapidly growing unstructured and semi-structured online digital text archives (e.g., Google Books) potentially offer a wealth of useful and important information to all of us in the information society, limited access mechanisms hinder the effective and efficient extraction of interesting, meaningful, and relevant information from these data archives. Adopting a GIScience perspective in this thesis, we aim to provide interested information seekers with visual and interactive means to access relevant spatial, temporal, and thematic information, and latent structures found in large digital text archives, using a typical digital text archive in the humanities as a case study. Unstructured and semi-structured, now increasingly digitally accessible text archives from the humanities are particularly interesting for geographers, as they contain a wealth of spatial, temporal, and thematic information, largely untapped for spatio-temporal and thematic data analyses in geography to date.
We address this research challenge using a three-pronged approach, informed by state-of-the-art GIScience methods and techniques. First, we demonstrate that spatial (i.e., place names), temporal (i.e., dates), and thematic information (i.e., topics in text documents) can be automatically retrieved from the Historical Dictionary of Switzerland (HDS), as one typical, digitally available semi-structured text archive in the humanities. We then show that the retrieved information can be meaningfully transformed and reorganized using a spatialization approach, such that this information can be presented to information seekers in the humanities in two-dimensional spatialized displays for further data exploration. These spatialized displays visually uncover latent spatio-temporal and thematic structures in the HDS text archive. Finally, adopting a user-centered graphical interface design and evaluation approach, we integrate spatialized displays in interactive online web interfaces, to make reorganized spatio-temporal and thematic information from the HDS available to information seekers for further exploration and knowledge discovery.
For that we constructed spatialized network maps and a spatialized thematic landscape map display with spatio-temporal and thematic information automatically retrieved from the digital HDS. The spatialized network maps depict relationships between Swiss toponyms in different centuries based on how often toponyms co-occur in the same HDS articles. The spatialized thematic landscape map display, created based on the self-organizing map technique, displays HDS articles as points on a map where thematically similar articles are placed closer to one another in the map than to semantically less similar articles. The maps can be explored interactively. To create useful and usable interactive web interfaces, including the spatialized displays, we involved target users early on in the interface design and development process. Target users provided valuable feedback in the performed utility and usability evaluations. This helped us to iteratively develop perceptually salient and cognitively supportive graphical user interfaces to the HDS text archive. It also facilitated access to and sense-making of the depicted information about the history of Switzerland.
This thesis has three major contributions: first, we provide a comprehensive text information retrieval approach going beyond existing approaches to extract information from text documents in the humanities and present a completely automatic approach to retrieve spatio-temporal and thematic information from a semi-structured text archive. Second, we illustrate how spatialization techniques can be used to depict spatio-temporal and thematic relationships and interconnections in the humanities, revealed by transforming and reorganizing the retrieved information. Third, we contribute a systematic user-centered method to incorporate the spatialized displays in interactive web interfaces. This allows interested information seekers in the humanities to explore spatio-temporal and thematic relationships and structures interactively, using advanced geovisual analytics approaches commonly known in GIScience, but still mostly unknown in history and the humanities.
The systematic evaluation of the automatically retrieved information from the HDS showed satisfactory quality, which suggests that this approach might be successful for other similar unstructured and semi-structured digital text archives in the humanities that include spatio-temporal and thematic information. Furthermore, the systematic evaluation of the constructed spatialized displays with target users suggests that using spatialized network displays to depict spatio-temporal relationships and interconnections, coupled with a spatialized thematic landscape to depict semantic similarities in text documents, aid target users in the humanities to gain new insights about spatio-temporal and thematic information buried in the HDS. The results of a final combined utility and usability study further reveals that target users are indeed able to interactively and visually explore the HDS text archive, and make sense of the novel spatialized displays.
In summary, this thesis highlights how advanced GIScience methods and approaches can be successfully transferred to the humanities to facilitate information access from growing unstructured and semi-structured text archives that also include spatio-temporal and thematic information.

Abstract

While rapidly growing unstructured and semi-structured online digital text archives (e.g., Google Books) potentially offer a wealth of useful and important information to all of us in the information society, limited access mechanisms hinder the effective and efficient extraction of interesting, meaningful, and relevant information from these data archives. Adopting a GIScience perspective in this thesis, we aim to provide interested information seekers with visual and interactive means to access relevant spatial, temporal, and thematic information, and latent structures found in large digital text archives, using a typical digital text archive in the humanities as a case study. Unstructured and semi-structured, now increasingly digitally accessible text archives from the humanities are particularly interesting for geographers, as they contain a wealth of spatial, temporal, and thematic information, largely untapped for spatio-temporal and thematic data analyses in geography to date.
We address this research challenge using a three-pronged approach, informed by state-of-the-art GIScience methods and techniques. First, we demonstrate that spatial (i.e., place names), temporal (i.e., dates), and thematic information (i.e., topics in text documents) can be automatically retrieved from the Historical Dictionary of Switzerland (HDS), as one typical, digitally available semi-structured text archive in the humanities. We then show that the retrieved information can be meaningfully transformed and reorganized using a spatialization approach, such that this information can be presented to information seekers in the humanities in two-dimensional spatialized displays for further data exploration. These spatialized displays visually uncover latent spatio-temporal and thematic structures in the HDS text archive. Finally, adopting a user-centered graphical interface design and evaluation approach, we integrate spatialized displays in interactive online web interfaces, to make reorganized spatio-temporal and thematic information from the HDS available to information seekers for further exploration and knowledge discovery.
For that we constructed spatialized network maps and a spatialized thematic landscape map display with spatio-temporal and thematic information automatically retrieved from the digital HDS. The spatialized network maps depict relationships between Swiss toponyms in different centuries based on how often toponyms co-occur in the same HDS articles. The spatialized thematic landscape map display, created based on the self-organizing map technique, displays HDS articles as points on a map where thematically similar articles are placed closer to one another in the map than to semantically less similar articles. The maps can be explored interactively. To create useful and usable interactive web interfaces, including the spatialized displays, we involved target users early on in the interface design and development process. Target users provided valuable feedback in the performed utility and usability evaluations. This helped us to iteratively develop perceptually salient and cognitively supportive graphical user interfaces to the HDS text archive. It also facilitated access to and sense-making of the depicted information about the history of Switzerland.
This thesis has three major contributions: first, we provide a comprehensive text information retrieval approach going beyond existing approaches to extract information from text documents in the humanities and present a completely automatic approach to retrieve spatio-temporal and thematic information from a semi-structured text archive. Second, we illustrate how spatialization techniques can be used to depict spatio-temporal and thematic relationships and interconnections in the humanities, revealed by transforming and reorganizing the retrieved information. Third, we contribute a systematic user-centered method to incorporate the spatialized displays in interactive web interfaces. This allows interested information seekers in the humanities to explore spatio-temporal and thematic relationships and structures interactively, using advanced geovisual analytics approaches commonly known in GIScience, but still mostly unknown in history and the humanities.
The systematic evaluation of the automatically retrieved information from the HDS showed satisfactory quality, which suggests that this approach might be successful for other similar unstructured and semi-structured digital text archives in the humanities that include spatio-temporal and thematic information. Furthermore, the systematic evaluation of the constructed spatialized displays with target users suggests that using spatialized network displays to depict spatio-temporal relationships and interconnections, coupled with a spatialized thematic landscape to depict semantic similarities in text documents, aid target users in the humanities to gain new insights about spatio-temporal and thematic information buried in the HDS. The results of a final combined utility and usability study further reveals that target users are indeed able to interactively and visually explore the HDS text archive, and make sense of the novel spatialized displays.
In summary, this thesis highlights how advanced GIScience methods and approaches can be successfully transferred to the humanities to facilitate information access from growing unstructured and semi-structured text archives that also include spatio-temporal and thematic information.

Statistics

Downloads

233 downloads since deposited on 14 Feb 2018
140 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Fabrikant Sara Irina, Purves Ross S, Hürlimann Katja
Communities & Collections:07 Faculty of Science > Institute of Geography
UZH Dissertations
Dewey Decimal Classification:910 Geography & travel
Language:English
Place of Publication:Zürich
Date:2017
Deposited On:14 Feb 2018 16:00
Last Modified:07 Apr 2020 07:08
Number of Pages:239
OA Status:Green
Free access at:Related URL. An embargo period may apply.
Related URLs:https://www.recherche-portal.ch/permalink/f/5u2s2l/ebi01_prod010935693 (Library Catalogue)

Download

Green Open Access

Download PDF  'Visualization and interactive exploration of spatio-temporal and thematic information in digital text archives'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 9MB