Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Densify: An R package to reduce empty cells in dataframes of typological linguistic data

Graff, Anna; Lischka, Marc; Zakharko, Taras; Furrer, Reinhard; Bickel, Balthasar (2024). Densify: An R package to reduce empty cells in dataframes of typological linguistic data. Journal of Open Source Software, 9(101):7024.

Abstract

The R package densify provides a procedure to prune input data frames containing empty cells (or cells with values {?} or {NA}) to denser sub-matrices with fewer empty cells. The pruning process trades off a series of variably weighted concerns, including data retention, coding density (proportion of non-empty cells) and taxonomic diversity of rows (representing for example phylogenetic relations). Users can adapt the relative weights given to these concerns
through various parameters so that the densification process best fits their needs. As such, the software is useful for several purposes, including the densification of sparse input matrices and the subsampling of large input matrices according to a procedure that is sensitive to taxonomic structure.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Department of Comparative Language Science
07 Faculty of Science > Institute of Mathematics
Dewey Decimal Classification:510 Mathematics
490 Other languages
890 Other literatures
410 Linguistics
Language:English
Date:6 September 2024
Deposited On:23 Sep 2024 12:04
Last Modified:23 Sep 2024 12:04
Publisher:Open Journals
ISSN:2475-9066
Additional Information:Conclusions: The R package densify provides users with a flexible and explicit method to generate submatrices from an input matrix in a mathematically principled way. The package documents case examples using a standard sparse linguistic dataset (WALS) and the standard linguistic taxonomy provided by Glottolog. Examples and further usage details for this software are found in the vignette hosted in the software repository on GitHub. Acknowledgements: The authors declare that there are no conflicts of interest.
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.21105/joss.07024
Download PDF  'Densify: An R package to reduce empty cells in dataframes of typological linguistic data'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

2 downloads since deposited on 23 Sep 2024
3 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications