Dataset Information

FIGG: simulating populations of whole genome sequences for heterogeneous data analyses.

ABSTRACT: BACKGROUND: High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequencing studies requires analysis tools that are developed and tested against matching quantities of heterogeneous data. RESULTS: We developed a large-scale whole genome simulation tool, FIGG, which generates large numbers of whole genomes with known sequence characteristics based on direct sampling of experimentally known or theorized variations. For normal variations we used publicly available data to determine the frequency of different mutation classes across the genome. FIGG then uses this information as a background to generate new sequences from a parent sequence with matching frequencies, but different actual mutations. The background can be normal variations, known disease variations, or a theoretical frequency distribution of variations. CONCLUSION: In order to enable the creation of large numbers of genomes, FIGG generates simulated sequences from known genomic variation and iteratively mutates each genome separately. The result is multiple whole genome sequences with unique variations that can primarily be used to provide different reference genomes, model heterogeneous populations, and can offer a standard test environment for new analysis algorithms or bioinformatics tools.

SUBMITTER: Killcoyne S

PROVIDER: S-EPMC4039316 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

FIGG: simulating populations of whole genome sequences for heterogeneous data analyses.

Killcoyne Sarah S del Sol Antonio A

BMC bioinformatics 20140519

<h4>Background</h4>High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequenci ...[more]

PMID: 24885193

Similar Datasets

Project description:BackgroundDomestication and improvement processes, accompanied by selections and adaptations, have generated genome-wide divergence and stratification in soybean populations. Simultaneously, soybean populations, which comprise diverse subpopulations, have developed their own adaptive characteristics enhancing fitness, resistance, agronomic traits, and morphological features. The genetic traits underlying these characteristics play a fundamental role in improving other soybean populations.ResultsThis study focused on identifying the selection signatures and adaptive characteristics in soybean populations. A core set of 245 accessions (112 wild-type, 79 landrace, and 54 improvement soybeans) selected from 4,234 soybean accessions was re-sequenced. Their genomic architectures were examined according to the domestication and improvement, and accessions were then classified into 3 wild-type, 2 landrace, and 2 improvement subgroups based on various population analyses. Selection and gene set enrichment analyses revealed that the landrace subgroups have selection signals for soybean-cyst nematode HG type 0 and seed development with germination, and that the improvement subgroups have selection signals for plant development with viability and seed development with embryo development, respectively. The adaptive characteristic for soybean-cyst nematode was partially underpinned by multiple resistance accessions, and the characteristics related to seed development were supported by our phenotypic findings for seed weights. Furthermore, their adaptive characteristics were also confirmed as genome-based evidence, and unique genomic regions that exhibit distinct selection and selective sweep patterns were revealed for 13 candidate genes.ConclusionsAlthough our findings require further biological validation, they provide valuable information about soybean breeding strategies and present new options for breeders seeking donor lines to improve soybean populations.

Dataset Information

FIGG: simulating populations of whole genome sequences for heterogeneous data analyses.

Publications

FIGG: simulating populations of whole genome sequences for heterogeneous data analyses.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets