Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

Wiewiórka, Marek S; Messina, Antonio; Pacholewska, Alicja; Maffioletti, Sergio; Gawrysiak, Piotr; Okoniewski, Michał J (2014). SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics, 30(18):2652-2653.

Abstract

Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > Functional Genomics Center Zurich
Dewey Decimal Classification:570 Life sciences; biology
610 Medicine & health
Scopus Subject Areas:Physical Sciences > Statistics and Probability
Life Sciences > Biochemistry
Life Sciences > Molecular Biology
Physical Sciences > Computer Science Applications
Physical Sciences > Computational Theory and Mathematics
Physical Sciences > Computational Mathematics
Language:English
Date:15 September 2014
Deposited On:29 Jan 2015 11:41
Last Modified:13 Jan 2025 02:36
Publisher:Oxford University Press
ISSN:1367-4803
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioinformatics/btu343
PubMed ID:24845651
Download PDF  'SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision'.
Preview
  • Content: Published Version
  • Language: English
  • Description: Nationallizenz 142-005

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
70 citations in Web of Science®
90 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

73 downloads since deposited on 29 Jan 2015
4 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications