Header

UZH-Logo

Maintenance Infos

Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats


Schmid, Michael; Frei, Daniel; Patrignani, Andrea; Schlapbach, Ralph; Frey, Jürg E; Remus-Emsermann, Mitja N P; Ahrens, Christian H (2018). Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats. Nucleic Acids Research, 46(17):8953-8965.

Abstract

Generating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length, which contained several genes that may confer fitness advantages to the strain. Its complex genome, which also included a variable shufflon region, could not be de novo assembled with long reads produced by Pacific Biosciences' technology, but required very long reads from Oxford Nanopore Technologies. Importantly, a repeat analysis, whose results we release for over 9600 prokaryotes, indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this 'dark matter' for de novo genome assembly of prokaryotes. Several of these 'dark matter' genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assembly algorithms capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.

Abstract

Generating a complete, de novo genome assembly for prokaryotes is often considered a solved problem. However, we here show that Pseudomonas koreensis P19E3 harbors multiple, near identical repeat pairs up to 70 kilobase pairs in length, which contained several genes that may confer fitness advantages to the strain. Its complex genome, which also included a variable shufflon region, could not be de novo assembled with long reads produced by Pacific Biosciences' technology, but required very long reads from Oxford Nanopore Technologies. Importantly, a repeat analysis, whose results we release for over 9600 prokaryotes, indicated that very complex bacterial genomes represent a general phenomenon beyond Pseudomonas. Roughly 10% of 9331 complete bacterial and a handful of 293 complete archaeal genomes represented this 'dark matter' for de novo genome assembly of prokaryotes. Several of these 'dark matter' genome assemblies contained repeats far beyond the resolution of the sequencing technology employed and likely contain errors, other genomes were closed employing labor-intense steps like cosmid libraries, primer walking or optical mapping. Using very long sequencing reads in combination with assembly algorithms capable of resolving long, near identical repeats will bring most prokaryotic genomes within reach of fast and complete de novo genome assembly.

Statistics

Citations

Dimensions.ai Metrics
11 citations in Web of Science®
6 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

25 downloads since deposited on 14 Feb 2019
25 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > Functional Genomics Center Zurich
Dewey Decimal Classification:570 Life sciences; biology
610 Medicine & health
Language:English
Date:28 September 2018
Deposited On:14 Feb 2019 15:28
Last Modified:25 Sep 2019 00:07
Publisher:Oxford University Press
ISSN:0305-1048
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/nar/gky726
PubMed ID:30137508

Download

Download PDF  'Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats'.
Preview
Content: Published Version
Filetype: PDF
Size: 2MB
View at publisher
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)