Dataset Information

PolyCat: a resource for genome categorization of sequencing reads from allopolyploid organisms.

ABSTRACT: Read mapping is a fundamental part of next-generation genomic research but is complicated by genome duplication in many plants. Categorizing DNA sequence reads into their respective genomes enables current methods to analyze polyploid genomes as if they were diploid. We present PolyCat-a pipeline for mapping and categorizing all types of next-generation sequence data produced from allopolyploid organisms. PolyCat uses GSNAP's single-nucleotide polymorphism (SNP)-tolerant mapping to minimize the mapping efficiency bias caused by SNPs between genomes. PolyCat then uses SNPs between genomes to categorize reads according to their respective genomes. Bisulfite-treated reads have a significant reduction in nucleotide complexity because nucleotide conversion events are confounded with transition substitutions. PolyCat includes special provisions to properly handle bisulfite-treated data. We demonstrate the functionality of PolyCat on allotetraploid cotton, Gossypium hirsutum, and create a functional SNP index for efficiently mapping sequence reads to the D-genome sequence of G. raimondii. PolyCat is appropriate for all allopolyploids and all types of next-generation genome analysis, including differential expression (RNA sequencing), differential methylation (bisulfite sequencing), differential DNA-protein binding (chromatin immunoprecipitation sequencing), and population diversity.

SUBMITTER: Page JT

PROVIDER: S-EPMC3583458 | biostudies-literature | 2013 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

PolyCat: a resource for genome categorization of sequencing reads from allopolyploid organisms.

Page Justin T JT Gingle Alan R AR Udall Joshua A JA

G3 (Bethesda, Md.) 20130301 3

Read mapping is a fundamental part of next-generation genomic research but is complicated by genome duplication in many plants. Categorizing DNA sequence reads into their respective genomes enables current methods to analyze polyploid genomes as if they were diploid. We present PolyCat-a pipeline for mapping and categorizing all types of next-generation sequence data produced from allopolyploid organisms. PolyCat uses GSNAP's single-nucleotide polymorphism (SNP)-tolerant mapping to minimize the ...[more]

PMID: 23450226

Dataset Information

PolyCat: a resource for genome categorization of sequencing reads from allopolyploid organisms.

Publications

PolyCat: a resource for genome categorization of sequencing reads from allopolyploid organisms.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads.
| S-EPMC4815510 | biostudies-literature

Functional categorization of metagenomic sequencing reads
| PRJEB36971 | ENA

Long Reads Are Revolutionizing 20 Years of Insect Genome Sequencing.
| S-EPMC8358217 | biostudies-literature

Circlator: automated circularization of genome assemblies using long sequencing reads.
| S-EPMC4699355 | biostudies-literature

Draft Genome Sequencing of the Highly Halotolerant and Allopolyploid Yeast Zygosaccharomyces rouxii NBRC 1876.
| S-EPMC5313615 | biostudies-literature

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions.
| S-EPMC3491411 | biostudies-literature

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.
| S-EPMC5481147 | biostudies-literature

PlasmidSeeker: identification of known plasmids from bacterial whole genome sequencing reads.
| S-EPMC5885972 | biostudies-literature

How resource abundance and resource stochasticity affect organisms' range sizes.
| S-EPMC11927164 | biostudies-literature

Nanopore sequencing and assembly of a human genome with ultra-long reads.
| S-EPMC5889714 | biostudies-literature