Navigation auf zora.uzh.ch

Search

ZORA (Zurich Open Repository and Archive)

An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics

Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H (2017). An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Research, 27(12):2083-2095.

Abstract

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Molecular Life Sciences
Dewey Decimal Classification:570 Life sciences; biology
Scopus Subject Areas:Life Sciences > Genetics
Health Sciences > Genetics (clinical)
Language:English
Date:22 December 2017
Deposited On:09 Jan 2018 09:33
Last Modified:21 Aug 2024 03:35
Publisher:Cold Spring Harbor Laboratory Press
ISSN:1088-9051
OA Status:Hybrid
Free access at:PubMed ID. An embargo period may apply.
Publisher DOI:https://doi.org/10.1101/gr.218255.116
PubMed ID:29141959
Download PDF  'An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
48 citations in Web of Science®
55 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

61 downloads since deposited on 09 Jan 2018
2 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications