Abstract
Gene content and gene-coding percentage can be predicted from genome size in newly sequenced organisms. Here, we investigate whether these predictions are influenced by phylogenetic relationships between the involved species. Combining a highly resolved phylogenetic tree with a large compilation of gene content data, our results reveal the presence of significant phylogenetic structure in the correlations between genome size and gene content in both bacteria and eukaryotes. The variation in log(gene content) explained by log(genome size) in combination with phylogeny was found to be 97% in bacteria and 55% in eukaryotes. Further, in bacteria, gene-coding percentages are only significantly correlated to genome size if phylogenetic information is taken into account in the analyses. These findings support the usage of phylogenetic correlation models for gene content predictions.