Header

UZH-Logo

Maintenance Infos

Travelling with speakers through time and space: Spatio-temporal modelling of language change


Neureiter, Nico. Travelling with speakers through time and space: Spatio-temporal modelling of language change. 2022, University of Zurich, Faculty of Science.

Abstract

Languages constantly change, but they also keep traces of the past. We can use these traces to study human history. How have languages changed over thousands of years? Where did the Indo-European languages originate, and how have the languages spread from there? Which historical languages have been in contact with each other, and how has that affected their vocabulary? Linguists, anthropologists and historians have studied these and similar questions for a long time. Only recently, researchers have started to use quantitative methods to find answers empirically from different types of linguistic data. For my PhD project, I was interested in such methods, specifically in statistical, spatio-temporal models of language change. Shared inheritance and contact are central processes explaining linguistic similarities. Already in the 19th century researchers proposed models for both processes: the tree model and the wave model. The tree model describes the inheritance of linguistic features from shared ancestor languages. The wave model describes the spread of linguistic features in space between languages that are in contact. More recently, both concepts – trees and waves – have found implementations in the form of statistical models. Phylogenetic trees model the inheritance of linguistic features from common ancestors. Phylogeography extends phylogenetic models
by a geographic component to represent the dispersal of languages from common origins. Both, phylogenetic and phylogeographic methods, have been widely adopted in the last decades and feature prominently in debates on linguistic classifications, the age of language families or linguistic homelands. Yet, it is unclear if phylogenetic models adequately represent two core geographic processes of language evolution, the spread of language families and language contact in space, respectively. First, while existing phylogeographic models promise to reconstruct the spread of language families, it is unclear to what extent we can trust these reconstructions. Second, current phylogenetic models ignore language contact entirely, which might bias the reconstruction and neglects an important part of the languages’ history. A separate line of research has developed various statistical models of language contact, but have often neglected shared inheritance. Unified statistical models of language evolution are still missing. The contributions of the thesis have been developed in response to the above research gaps and are presented in four research articles (Chapters II-V).
Chapter II evaluates the adequacy of phylogeographic models for reconstructing linguistic homelands. We simulate the diversification and 4 geographic dispersal of languages in two distinct historical scenarios, migration and expansion, and test how well state-of-the-art phylogeographic methods can reconstruct the simulated homeland. In both scenarios, we pay special attention to the effect of directional trends, i.e. predominant movement in a particular direction. Directional trends can be expected in the actual dispersal of languages but are not explicitly covered by
the assumptions of current phylogeographic models. We find that in the expansion scenario, the reconstruction works well and is stable to directional trends, but in the case of migrations, the reconstruction error increases proportionally with the directional trends. Chapter II further discusses how these findings can be interpreted in a historical context,
and we give recommendations for researchers applying phylogeographic methods.
Chapter III introduces contacTrees, a new phylogenetic model with language contact that is made available as a package for the phylogenetics software BEAST 2. It infers contact events in a phylogenetic tree, where a donor language borrows linguistic features from a receiver language. We test the model in a simulation study and apply it in a case study on Celtic, Germanic and Romance languages. The case study shows that contacTrees can indeed infer contact edges corresponding to known contact events with documented loanwords as well as latent contact effects.
In Chapter IV, we introduce sBayes, a Bayesian model of linguistic areas. The model disentangles signals of universal preference, shared inheritance and contact. It thereby detects linguistic areas as groups of languages better explained by a corresponding areal distribution than the confounding distributions for universal preference and language families. The chapter includes simulation studies and two case studies to demonstrate the model’s applicability. On top of that, the model is already in use in several follow-up studies. One of them is presented in Chapter V, where we applied sBayes in a case study on languages of the Ancient Near East. This case study is particularly interesting since it involves historical languages, including some of the oldest attested languages. The inferred area – consisting only of Hurrian and Sumerian – is unlikely to be a contact area, but may be the result of a historical shift in the worldwide distribution of linguistic features, due to the spread of large language families and macro areas.
This thesis reflects on the current developments towards spatio-temporal models of language change and contributes new models in this direction. I reviewed and tested phylogeographic models of language dispersal and gave recommendations for application studies. I further identified shortcomings in phylogenetic models of language evolution and extended them to incorporate language contact. I proposed a new model of linguistic areas and apply it in relevant case studies. Finally, I give an outlook on how the methods in this thesis can be extended and I offer a broader view of the paths ahead in the quantitative study of language evolution.

Abstract

Languages constantly change, but they also keep traces of the past. We can use these traces to study human history. How have languages changed over thousands of years? Where did the Indo-European languages originate, and how have the languages spread from there? Which historical languages have been in contact with each other, and how has that affected their vocabulary? Linguists, anthropologists and historians have studied these and similar questions for a long time. Only recently, researchers have started to use quantitative methods to find answers empirically from different types of linguistic data. For my PhD project, I was interested in such methods, specifically in statistical, spatio-temporal models of language change. Shared inheritance and contact are central processes explaining linguistic similarities. Already in the 19th century researchers proposed models for both processes: the tree model and the wave model. The tree model describes the inheritance of linguistic features from shared ancestor languages. The wave model describes the spread of linguistic features in space between languages that are in contact. More recently, both concepts – trees and waves – have found implementations in the form of statistical models. Phylogenetic trees model the inheritance of linguistic features from common ancestors. Phylogeography extends phylogenetic models
by a geographic component to represent the dispersal of languages from common origins. Both, phylogenetic and phylogeographic methods, have been widely adopted in the last decades and feature prominently in debates on linguistic classifications, the age of language families or linguistic homelands. Yet, it is unclear if phylogenetic models adequately represent two core geographic processes of language evolution, the spread of language families and language contact in space, respectively. First, while existing phylogeographic models promise to reconstruct the spread of language families, it is unclear to what extent we can trust these reconstructions. Second, current phylogenetic models ignore language contact entirely, which might bias the reconstruction and neglects an important part of the languages’ history. A separate line of research has developed various statistical models of language contact, but have often neglected shared inheritance. Unified statistical models of language evolution are still missing. The contributions of the thesis have been developed in response to the above research gaps and are presented in four research articles (Chapters II-V).
Chapter II evaluates the adequacy of phylogeographic models for reconstructing linguistic homelands. We simulate the diversification and 4 geographic dispersal of languages in two distinct historical scenarios, migration and expansion, and test how well state-of-the-art phylogeographic methods can reconstruct the simulated homeland. In both scenarios, we pay special attention to the effect of directional trends, i.e. predominant movement in a particular direction. Directional trends can be expected in the actual dispersal of languages but are not explicitly covered by
the assumptions of current phylogeographic models. We find that in the expansion scenario, the reconstruction works well and is stable to directional trends, but in the case of migrations, the reconstruction error increases proportionally with the directional trends. Chapter II further discusses how these findings can be interpreted in a historical context,
and we give recommendations for researchers applying phylogeographic methods.
Chapter III introduces contacTrees, a new phylogenetic model with language contact that is made available as a package for the phylogenetics software BEAST 2. It infers contact events in a phylogenetic tree, where a donor language borrows linguistic features from a receiver language. We test the model in a simulation study and apply it in a case study on Celtic, Germanic and Romance languages. The case study shows that contacTrees can indeed infer contact edges corresponding to known contact events with documented loanwords as well as latent contact effects.
In Chapter IV, we introduce sBayes, a Bayesian model of linguistic areas. The model disentangles signals of universal preference, shared inheritance and contact. It thereby detects linguistic areas as groups of languages better explained by a corresponding areal distribution than the confounding distributions for universal preference and language families. The chapter includes simulation studies and two case studies to demonstrate the model’s applicability. On top of that, the model is already in use in several follow-up studies. One of them is presented in Chapter V, where we applied sBayes in a case study on languages of the Ancient Near East. This case study is particularly interesting since it involves historical languages, including some of the oldest attested languages. The inferred area – consisting only of Hurrian and Sumerian – is unlikely to be a contact area, but may be the result of a historical shift in the worldwide distribution of linguistic features, due to the spread of large language families and macro areas.
This thesis reflects on the current developments towards spatio-temporal models of language change and contributes new models in this direction. I reviewed and tested phylogeographic models of language dispersal and gave recommendations for application studies. I further identified shortcomings in phylogenetic models of language evolution and extended them to incorporate language contact. I proposed a new model of linguistic areas and apply it in relevant case studies. Finally, I give an outlook on how the methods in this thesis can be extended and I offer a broader view of the paths ahead in the quantitative study of language evolution.

Statistics

Downloads

5 downloads since deposited on 25 Jan 2023
4 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Weibel Robert, Ranacher Peter, Purves Ross S, Glaser Elvira, Bouckaert Remco R
Communities & Collections:07 Faculty of Science > Institute of Geography
UZH Dissertations
Dewey Decimal Classification:910 Geography & travel
Language:English
Place of Publication:Zürich
Date:2022
Deposited On:25 Jan 2023 10:59
Last Modified:25 Jan 2023 10:59
Number of Pages:158
OA Status:Closed