Dataset Information

Reference-free comparative genomics of 174 chloroplasts.

ABSTRACT: Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.

SUBMITTER: Kua CS

PROVIDER: S-EPMC3502452 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Reference-free comparative genomics of 174 chloroplasts.

Kua Chai-Shian CS Ruan Jue J Harting John J Ye Cheng-Xi CX Helmus Matthew R MR Yu Jun J Cannon Charles H CH

PloS one 20121120 11

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shar ...[more]

PMID: 23185288

Dataset Information

Reference-free comparative genomics of 174 chloroplasts.

Publications

Reference-free comparative genomics of 174 chloroplasts.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Gene family assignment-free comparative genomics.
| S-EPMC3526435 | biostudies-literature

metaVaR: Introducing metavariant species models for reference-free metagenomic-based population genomics.
| S-EPMC7773188 | biostudies-literature

The non-human primate reference transcriptome resource (NHPRTR) for comparative functional genomics.
| S-EPMC3531109 | biostudies-literature

CMG-biotools, a free workbench for basic comparative microbial genomics.
| S-EPMC3618517 | biostudies-literature

Comparative Genomics of Host-Symbiont and Free-Living Oceanobacillus Species.
| S-EPMC5425236 | biostudies-literature

Comparative genomics of metabolic networks of free-living and parasitic eukaryotes.
| S-EPMC2858753 | biostudies-literature

Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap.
| S-EPMC3623758 | biostudies-literature

Genome assembly of the JD17 soybean provides a new reference genome for comparative genomics.
| S-EPMC8982393 | biostudies-literature

Chromosome-Level Reference Genomes for Two Strains of Caenorhabditis briggsae: An Improved Platform for Comparative Genomics.
| S-EPMC9011032 | biostudies-literature

A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter.
| S-EPMC3968026 | biostudies-literature