Publication:

Densify: An R package to reduce empty cells in dataframes of typological linguistic data

Date

Date

Date
2024
Journal Article
Published version
cris.virtual.orcid0000-0002-9087-0565
cris.virtual.orcid0000-0002-7703-3471
cris.virtual.orcid0000-0002-6319-2332
cris.virtualsource.orcid0a73188e-c464-488a-b544-64ea66244d77
cris.virtualsource.orcid19102f9f-d890-4292-ac62-bb028e4f3c1b
cris.virtualsource.orcidb9152a18-bf87-4222-a67d-211bfb1d8bf1
dc.contributor.institutionUniversity of Zurich
dc.date.accessioned2024-09-23T12:04:42Z
dc.date.available2024-09-23T12:04:42Z
dc.date.issued2024-09-06
dc.description.abstract

The R package densify provides a procedure to prune input data frames containing empty cells (or cells with values {?} or {NA}) to denser sub-matrices with fewer empty cells. The pruning process trades off a series of variably weighted concerns, including data retention, coding density (proportion of non-empty cells) and taxonomic diversity of rows (representing for example phylogenetic relations). Users can adapt the relative weights given to these concerns through various parameters so that the densification process best fits their needs. As such, the software is useful for several purposes, including the densification of sparse input matrices and the subsampling of large input matrices according to a procedure that is sensitive to taxonomic structure.

dc.identifier.doi10.21105/joss.07024
dc.identifier.issn2475-9066
dc.identifier.urihttps://www.zora.uzh.ch/handle/20.500.14742/221416
dc.language.isoeng
dc.subject.ddc510 Mathematics
dc.subject.ddc490 Other languages
dc.subject.ddc890 Other literatures
dc.subject.ddc410 Linguistics
dc.title

Densify: An R package to reduce empty cells in dataframes of typological linguistic data

dc.typearticle
dcterms.accessRightsinfo:eu-repo/semantics/openAccess
dcterms.bibliographicCitation.journaltitleJournal of Open Source Software
dcterms.bibliographicCitation.number101
dcterms.bibliographicCitation.originalpublishernameOpen Journals
dcterms.bibliographicCitation.pagestart7024
dcterms.bibliographicCitation.volume9
dspace.entity.typePublicationen
uzh.contributor.authorGraff, Anna
uzh.contributor.authorLischka, Marc
uzh.contributor.authorZakharko, Taras
uzh.contributor.authorFurrer, Reinhard
uzh.contributor.authorBickel, Balthasar
uzh.contributor.correspondenceYes
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.contributor.correspondenceNo
uzh.document.availabilitypublished_version
uzh.eprint.datestamp2024-09-23 12:04:42
uzh.eprint.lastmod2025-02-04 19:40:17
uzh.eprint.statusChange2024-09-23 12:04:42
uzh.harvester.ethYes
uzh.harvester.nbNo
uzh.identifier.doi10.5167/uzh-262418
uzh.jdb.eprintsId42654
uzh.note.publicConclusions: The R package densify provides users with a flexible and explicit method to generate submatrices from an input matrix in a mathematically principled way. The package documents case examples using a standard sparse linguistic dataset (WALS) and the standard linguistic taxonomy provided by Glottolog. Examples and further usage details for this software are found in the vignette hosted in the software repository on GitHub. Acknowledgements: The authors declare that there are no conflicts of interest.
uzh.oastatus.unpaywallgold
uzh.oastatus.zoraGold
uzh.publication.citationGraff, Anna; Lischka, Marc; Zakharko, Taras; Furrer, Reinhard; Bickel, Balthasar (2024). Densify: An R package to reduce empty cells in dataframes of typological linguistic data. Journal of Open Source Software, 9(101):7024.
uzh.publication.freeAccessAtdoi
uzh.publication.originalworkoriginal
uzh.publication.publishedStatusfinal
uzh.workflow.doajuzh.workflow.doaj.true
uzh.workflow.eprintid262418
uzh.workflow.fulltextStatuspublic
uzh.workflow.revisions18
uzh.workflow.rightsCheckoffen
uzh.workflow.sourceCrossref:10.21105/joss.07024
uzh.workflow.statusarchive
Files

Original bundle

Name:
Furrer2024_densify.pdf
Size:
210.06 KB
Format:
Adobe Portable Document Format
Publication available in collections: