Navigation auf zora.uzh.ch

Search

ZORA (Zurich Open Repository and Archive)

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Duò, Angelo; Robinson, Mark D; Soneson, Charlotte (2018). A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Research, 7:1141.

Abstract

Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( https://bioconductor.org/packages/DuoClustering2018).

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Molecular Life Sciences
08 Research Priority Programs > Evolution in Action: From Genomes to Ecosystems
Dewey Decimal Classification:570 Life sciences; biology
Scopus Subject Areas:Life Sciences > General Biochemistry, Genetics and Molecular Biology
Life Sciences > General Immunology and Microbiology
Life Sciences > General Pharmacology, Toxicology and Pharmaceutics
Language:English
Date:2018
Deposited On:31 Jan 2020 13:42
Last Modified:05 Sep 2024 03:30
Publisher:Faculty of 1000 Ltd.
ISSN:2046-1402
OA Status:Gold
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.12688/f1000research.15666.2
PubMed ID:30271584
Download PDF  'A systematic performance evaluation of clustering methods for single-cell RNA-seq data'.
Preview
  • Content: Published Version
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

49 downloads since deposited on 31 Jan 2020
7 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications