Header

UZH-Logo

Maintenance Infos

SCIM: universal single-cell matching with unpaired feature sets


Stark, Stefan G; Ficek, Joanna; Locatello, Francesco; Bonilla, Ximena; Chevrier, Stéphane; Singer, Franziska; Tumor Profiler Consortium; Rätsch, Gunnar; Lehmann, Kjong-Van (2020). SCIM: universal single-cell matching with unpaired feature sets. Bioinformatics, 36(Supp.):i919-i927.

Abstract

MOTIVATION

Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed.

RESULTS

We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively.

AVAILABILITY AND IMPLEMENTATION

https://github.com/ratschlab/scim.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Abstract

MOTIVATION

Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed.

RESULTS

We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively.

AVAILABILITY AND IMPLEMENTATION

https://github.com/ratschlab/scim.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Statistics

Citations

Altmetrics

Downloads

3 downloads since deposited on 01 Feb 2021
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > University Hospital Zurich > Clinic for Oncology and Hematology
Dewey Decimal Classification:610 Medicine & health
Scopus Subject Areas:Physical Sciences > Statistics and Probability
Life Sciences > Biochemistry
Life Sciences > Molecular Biology
Physical Sciences > Computer Science Applications
Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Computational Mathematics
Language:English
Date:30 December 2020
Deposited On:01 Feb 2021 06:37
Last Modified:06 Feb 2021 04:30
Publisher:Oxford University Press
ISSN:1367-4803
OA Status:Hybrid
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioinformatics/btaa843
PubMed ID:33381818

Download

Hybrid Open Access

Download PDF  'SCIM: universal single-cell matching with unpaired feature sets'.
Preview
Content: Published Version
Filetype: PDF
Size: 512kB
View at publisher
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)