Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers

Hofmann, Ariane L; Behr, Jonas; Singer, Jochen; Kuipers, Jack; Beisel, Christian; Schraml, Peter; Moch, Holger; Beerenwinkel, Niko (2017). Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers. BMC Bioinformatics, 18(1):8.

Abstract

BACKGROUND Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant. RESULTS Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision. CONCLUSIONS The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > University Hospital Zurich > Institute of Pathology and Molecular Pathology
Dewey Decimal Classification:610 Medicine & health
Scopus Subject Areas:Life Sciences > Structural Biology
Life Sciences > Biochemistry
Life Sciences > Molecular Biology
Physical Sciences > Computer Science Applications
Physical Sciences > Applied Mathematics
Language:English
Date:3 January 2017
Deposited On:10 Jan 2017 17:22
Last Modified:15 Sep 2024 01:39
Publisher:BioMed Central
ISSN:1471-2105
OA Status:Gold
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.1186/s12859-016-1417-7
PubMed ID:28049408
Project Information:
  • Funder: FP7
  • Grant ID: 609883
  • Project Title: MERIC - Mechanisms of Evasive Resistance in Cancer
  • Funder: FP7
  • Grant ID: 609883
  • Project Title: MERIC - Mechanisms of Evasive Resistance in Cancer
  • Funder: H2020
  • Grant ID: 633974
  • Project Title: SOUND - Statistical multi-Omics UNDerstanding of Patient Samples
Download PDF  'Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers'.
Preview
  • Content: Published Version
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
29 citations in Web of Science®
35 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

90 downloads since deposited on 10 Jan 2017
3 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications