Unknown

Dataset Information

0

Reference-guided assembly of four diverse Arabidopsis thaliana genomes.


ABSTRACT: We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.

SUBMITTER: Schneeberger K 

PROVIDER: S-EPMC3121819 | biostudies-other | 2011 Jun

REPOSITORIES: biostudies-other

altmetric image

Publications

Reference-guided assembly of four diverse Arabidopsis thaliana genomes.

Schneeberger Korbinian K   Ossowski Stephan S   Ott Felix F   Klein Juliane D JD   Wang Xi X   Lanz Christa C   Smith Lisa M LM   Cao Jun J   Fitz Joffrey J   Warthmann Norman N   Henz Stefan R SR   Huson Daniel H DH   Weigel Detlef D  

Proceedings of the National Academy of Sciences of the United States of America 20110606 25


We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the com  ...[more]

Similar Datasets

| S-EPMC4856438 | biostudies-literature
2011-02-11 | E-GEOD-24569 | biostudies-arrayexpress
2011-02-11 | GSE24569 | GEO
2011-08-29 | GSE30814 | GEO
| S-EPMC5891936 | biostudies-literature
| PRJEB31147 | ENA
| S-EPMC4330339 | biostudies-literature
| S-EPMC6816165 | biostudies-literature
| S-EPMC4629036 | biostudies-literature
| S-EPMC4382294 | biostudies-literature