UZH-Logo

Maintenance Infos

A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation


Trachana, Kalliopi; Forslund, Kristoffer; Larsson, Tomas; Powell, Sean; Doerks, Tobias; von Mering, Christian; Bork, Peer (2014). A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation. PLoS ONE, 9(11):e111122.

Abstract

Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Abstract

Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Citations

3 citations in Web of Science®
4 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

6 downloads since deposited on 21 Jan 2016
6 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Molecular Life Sciences
08 University Research Priority Programs > Systems Biology / Functional Genomics
08 University Research Priority Programs > Evolution in Action: From Genomes to Ecosystems
Dewey Decimal Classification:570 Life sciences; biology
Language:English
Date:2014
Deposited On:21 Jan 2016 11:58
Last Modified:05 Apr 2016 19:57
Publisher:Public Library of Science (PLoS)
ISSN:1932-6203
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1371/journal.pone.0111122
PubMed ID:25369365

Download

[img]
Preview
Content: Published Version
Filetype: PDF
Size: 2MB
View at publisher
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations