UZH-Logo

Maintenance Infos

PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra


Roos, F F; Jacob, R; Grossmann, J; Fischer, B; Buhmann, J M; Gruissem, W; Baginsky, S; Widmayer, P (2007). PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra. Bioinformatics, 23(22):3016-3023.

Abstract

MOTIVATION: Tandem mass spectrometry allows for high-throughput identification of complex protein samples. Searching tandem mass spectra against sequence databases is the main analysis method nowadays. Since many peptide variations are possible, including them in the search space seems only logical. However, the search space usually grows exponentially with the number of independent variations and may therefore overwhelm computational resources. RESULTS: We provide fast, cache-efficient search algorithms to screen large peptide search spaces including non-tryptic peptides, whole genomes, dozens of posttranslational modifications, unannotated point mutations and even unannotated splice sites. All these search spaces can be screened simultaneously. By optimizing the cache usage, we achieve a calculation speed that closely approaches the limits of the hardware. At the same time, we control the size of the overall search space by limiting the combinations of variations that can co-occur on the same peptide. Using a hypergeometric scoring scheme, we applied these algorithms to a dataset of 1 420 632 spectra. We were able to identify a considerable number of peptide variations within a modest amount of computing time on standard desktop computers.

MOTIVATION: Tandem mass spectrometry allows for high-throughput identification of complex protein samples. Searching tandem mass spectra against sequence databases is the main analysis method nowadays. Since many peptide variations are possible, including them in the search space seems only logical. However, the search space usually grows exponentially with the number of independent variations and may therefore overwhelm computational resources. RESULTS: We provide fast, cache-efficient search algorithms to screen large peptide search spaces including non-tryptic peptides, whole genomes, dozens of posttranslational modifications, unannotated point mutations and even unannotated splice sites. All these search spaces can be screened simultaneously. By optimizing the cache usage, we achieve a calculation speed that closely approaches the limits of the hardware. At the same time, we control the size of the overall search space by limiting the combinations of variations that can co-occur on the same peptide. Using a hypergeometric scoring scheme, we applied these algorithms to a dataset of 1 420 632 spectra. We were able to identify a considerable number of peptide variations within a modest amount of computing time on standard desktop computers.

Citations

28 citations in Web of Science®
27 citations in Scopus®
Google Scholar™

Altmetrics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > Functional Genomics Center Zurich
08 University Research Priority Programs > Systems Biology / Functional Genomics
08 University Research Priority Programs > Systems Biology / Functional Genomics
Dewey Decimal Classification:570 Life sciences; biology
610 Medicine & health
Language:English
Date:2007
Deposited On:30 Dec 2009 10:06
Last Modified:05 Apr 2016 13:36
Publisher:Oxford University Press
ISSN:1367-4803
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioinformatics/btm417
PubMed ID:17768164

Download

Full text not available from this repository.View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations