Header

UZH-Logo

Maintenance Infos

Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data.


Qeli, Ermir; Omasits, Ulrich; Goetze, Sandra; Stekhoven, Daniel J; Frey, Juerg E; Basler, Konrad; Wollscheid, Bernd; Brunner, Erich; Ahrens, Christian H (2014). Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data. Journal of Proteomics, 108:269-283.

Abstract

UNLABELLED: The in silico prediction of the best-observable "proteotypic" peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative training sets to identify those peptide properties that most profoundly affect their general detectability. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins.
BIOLOGICAL SIGNIFICANCE: Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist´s need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms.

Abstract

UNLABELLED: The in silico prediction of the best-observable "proteotypic" peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative training sets to identify those peptide properties that most profoundly affect their general detectability. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins.
BIOLOGICAL SIGNIFICANCE: Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist´s need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms.

Statistics

Citations

10 citations in Web of Science®
10 citations in Scopus®
Google Scholar™

Altmetrics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Molecular Life Sciences
Dewey Decimal Classification:570 Life sciences; biology
Uncontrolled Keywords:Machine learning; Peptide detectability; Proteotypic peptides; Rank prediction algorithms; SRM; Targeted proteomics
Language:English
Date:28 August 2014
Deposited On:17 Nov 2014 15:22
Last Modified:05 Apr 2016 18:30
Publisher:Elsevier
ISSN:1874-3919
Publisher DOI:https://doi.org/10.1016/j.jprot.2014.05.011
PubMed ID:24878426

Download

Full text not available from this repository.
View at publisher

Article Networks

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations