Navigation auf zora.uzh.ch

Search

ZORA (Zurich Open Repository and Archive)

Lexedata: A toolbox to edit CLDF lexical datasets

Kaiping, Gereon A; Steiger, Melvin S; Chousou-Polydouri, Natalia (2022). Lexedata: A toolbox to edit CLDF lexical datasets. Journal of Open Source Software, 7(72):4140.

Abstract

Lexedata is a collection of tools to support the editing process of comparative lexical data. Wordlists are a comparatively easily collected type of language documentation that is nonetheless quite data-rich and useful for the systematic comparison of languages (List et al., 2021). They are an important resource in comparative and historical linguistics, including their use as raw data for language phylogenetics (Gray et al., 2009; Grollemund et al., 2015).

The lexedata package uses the “Cross-Linguistic Data Format” (CLDF, Forkel et al. (2021), Forkel et al. (2018)) as the main data format for a relational database containing forms, languages, concepts, and etymological relationships. The CLDF specification builds on top of the CSV for the Web (CSVW, Pollock et al. (2015)) specs by the W3C, and as such consists of one or more comma-separated value (CSV) files that get their semantics from a metadata file in JSON format.

Implemented in Python as a set of command line tools, Lexedata provides various helper functions to address issues that frequently arise when working with comparative wordlists for multiple languages, as shown in Figure 1. These include importing from and exporting to formats more familiar to linguists, as well as bulk edit functions and associated integrity checks. For example, there are scripts for importing data from MS Excel sheets of various common formats into CLDF, checking for homophones, manipulating etymological judgements, and exporting coded datasets for use in phylogenetic software.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Geography
08 Research Priority Programs > Language and Space
06 Faculty of Arts > Department of Comparative Language Science
Dewey Decimal Classification:400 Language
490 Other languages
890 Other literatures
410 Linguistics
910 Geography & travel
Language:English
Date:20 April 2022
Deposited On:16 Jun 2022 12:47
Last Modified:16 Jun 2022 12:47
Publisher:Open Journals
ISSN:2475-9066
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.21105/joss.04140
Download PDF  'Lexedata: A toolbox to edit CLDF lexical datasets'.
Preview
  • Content: Published Version
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

25 downloads since deposited on 16 Jun 2022
11 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications