Header

UZH-Logo

Maintenance Infos

V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data


Posada-Céspedes, Susana; Seifert, David; Topolsky, Ivan; Jablonski, Kim Philipp; Metzner, Karin J; Beerenwinkel, Niko (2021). V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data. Bioinformatics, 37(12):1673-1680.

Abstract

MOTIVATION

High-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations.

RESULTS

To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape.

AVAILABILITY

V-pipe is freely available at https://github.com/cbg-ethz/V-pipe.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Abstract

MOTIVATION

High-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations.

RESULTS

To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape.

AVAILABILITY

V-pipe is freely available at https://github.com/cbg-ethz/V-pipe.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Statistics

Citations

Dimensions.ai Metrics
41 citations in Web of Science®
48 citations in Scopus®
Google Scholar™

Altmetrics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > University Hospital Zurich > Clinic for Infectious Diseases
04 Faculty of Medicine > Institute of Medical Virology
Dewey Decimal Classification:610 Medicine & health
Scopus Subject Areas:Physical Sciences > Statistics and Probability
Life Sciences > Biochemistry
Life Sciences > Molecular Biology
Physical Sciences > Computer Science Applications
Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Computational Mathematics
Language:English
Date:20 January 2021
Deposited On:19 Jan 2022 11:23
Last Modified:26 Jun 2024 01:50
Publisher:Oxford University Press
ISSN:1367-4803
OA Status:Closed
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioinformatics/btab015
PubMed ID:33471068
Full text not available from this repository.