Header

UZH-Logo

Maintenance Infos

Automated Dating of the World’s Language Families Based on Lexical Similarity


Holman, Eric W; Brown, Cecil H; Wichmann, Søren; Müller, André; Velupillai, Viveka; Hammarström, Harald; Sauppe, Sebastian; Jung, Hagen; Bakker, Dik; Brown, Pamela; Belyaev, Oleg; Urban, Matthias; Mailhammer, Robert; List, Johann-Mattis; Egorov, Dmitry (2011). Automated Dating of the World’s Language Families Based on Lexical Similarity. Current Anthropology, 52(6):841-875.

Abstract

This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from Levenshtein (edit) distances rather than on cognate percentages, and (4) it provides a formula for date calculation that mathematically recognizes the lexical heterogeneity of individual languages, including parent languages just before their breakup into daughter languages. Automated judgments of lexical similarity for groups of related languages are calibrated with historical, epigraphic, and archaeological divergence dates for 52 language groups. The discrepancies between estimated and calibration dates are found to be on average 29% as large as the estimated dates themselves, a figure that does not differ significantly among language families. As a resource for further research that may require dates of known level of accuracy, we offer a list of ASJP time depths for nearly all the world’s recognized language families and for many subfamilies.

Abstract

This paper describes a computerized alternative to glottochronology for estimating elapsed time since parent languages diverged into daughter languages. The method, developed by the Automated Similarity Judgment Program (ASJP) consortium, is different from glottochronology in four major respects: (1) it is automated and thus is more objective, (2) it applies a uniform analytical approach to a single database of worldwide languages, (3) it is based on lexical similarity as determined from Levenshtein (edit) distances rather than on cognate percentages, and (4) it provides a formula for date calculation that mathematically recognizes the lexical heterogeneity of individual languages, including parent languages just before their breakup into daughter languages. Automated judgments of lexical similarity for groups of related languages are calibrated with historical, epigraphic, and archaeological divergence dates for 52 language groups. The discrepancies between estimated and calibration dates are found to be on average 29% as large as the estimated dates themselves, a figure that does not differ significantly among language families. As a resource for further research that may require dates of known level of accuracy, we offer a list of ASJP time depths for nearly all the world’s recognized language families and for many subfamilies.

Statistics

Citations

Altmetrics

Downloads

13 downloads since deposited on 21 Jun 2017
13 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Department of Comparative Linguistics
Dewey Decimal Classification:490 Other languages
890 Other literatures
410 Linguistics
Language:English
Date:2011
Deposited On:21 Jun 2017 12:24
Last Modified:03 Aug 2017 08:22
Publisher:University of Chicago Press
ISSN:0011-3204
Publisher DOI:https://doi.org/10.1086/662127

Download

Preview Icon on Download
Preview
Content: Published Version
Filetype: PDF
Size: 461kB
View at publisher
Preview Icon on Download
Preview
Content: Accepted Version
Filetype: PDF
Size: 287kB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations