Unknown

Dataset Information

0

GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.


ABSTRACT: An important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that quickly and accurately infers genome properties across thousands of simulated and several real datasets spanning a broad range of complexity. We also present a method called Smudgeplot (https://github.com/KamilSJaron/smudgeplot) to visualize and estimate the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in the Meloidogyne genus and the extreme case of octoploid Fragaria?×?ananassa.

SUBMITTER: Ranallo-Benavidez TR 

PROVIDER: S-EPMC7080791 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.

Ranallo-Benavidez T Rhyker TR   Jaron Kamil S KS   Schatz Michael C MC  

Nature communications 20200318 1


An important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe  ...[more]

Similar Datasets

| PRJEB59232 | ENA
| S-EPMC5870704 | biostudies-literature
| S-EPMC6938933 | biostudies-literature
| S-EPMC7066127 | biostudies-literature
| PRJEB49424 | ENA
2022-12-01 | GSE157143 | GEO
| S-EPMC5287235 | biostudies-literature
| S-EPMC6122573 | biostudies-literature
| S-EPMC2238982 | biostudies-literature
| S-EPMC7780692 | biostudies-literature