Unknown

Dataset Information

0

High-quality draft assemblies of mammalian genomes from massively parallel sequence data.


ABSTRACT: Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (?99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.

SUBMITTER: Gnerre S 

PROVIDER: S-EPMC3029755 | biostudies-literature | 2011 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications


Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the  ...[more]

Similar Datasets

| S-EPMC10942973 | biostudies-literature
| S-EPMC5666806 | biostudies-literature
| S-EPMC10557687 | biostudies-literature
| S-EPMC11229359 | biostudies-literature
| S-EPMC10563150 | biostudies-literature
| S-EPMC10104058 | biostudies-literature
| S-EPMC4641336 | biostudies-literature
| S-EPMC6771372 | biostudies-literature
| S-EPMC9778533 | biostudies-literature
| S-EPMC3402344 | biostudies-literature