Unknown

Dataset Information

0

Virtual Genome Walking across the 32?Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.


ABSTRACT: Large repeat rich genomes present challenges for assembly using short read technologies. The 32?Gb axolotl genome is estimated to contain ~19?Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

SUBMITTER: Evans T 

PROVIDER: S-EPMC5766544 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

Evans Teri T   Johnson Andrew D AD   Loose Matthew M  

Scientific reports 20180112 1


Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genom  ...[more]

Similar Datasets

| S-EPMC7283310 | biostudies-literature
| S-EPMC3503958 | biostudies-literature
| S-EPMC9911811 | biostudies-literature
| S-EPMC8110773 | biostudies-literature
| PRJNA1073213 | ENA
| PRJNA12938 | ENA
| PRJNA141103 | ENA
| PRJNA400170 | ENA
| PRJNA186654 | ENA
| PRJNA306100 | ENA