Dataset Information

Population-based structural variation discovery with Hydra-Multi.

ABSTRACT:

Unlabelled

Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extant tools suffer from poor scalability, which limits the number of genomes that can be co-analyzed and complicates analysis workflows. We have developed an approach that enables multisample SV analysis in hundreds to thousands of human genomes using commodity hardware. Here, we describe Hydra-Multi and measure its accuracy, speed and scalability using publicly available datasets provided by The 1000 Genomes Project and by The Cancer Genome Atlas (TCGA).

Availability and implementation

Hydra-Multi is written in C++ and is freely available at https://github.com/arq5x/Hydra.

Contact

aaronquinlan@gmail.com or ihall@genome.wustl.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Lindberg MR

PROVIDER: S-EPMC4393510 | biostudies-literature | 2015 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Population-based structural variation discovery with Hydra-Multi.

Lindberg Michael R MR Hall Ira M IM Quinlan Aaron R AR

Bioinformatics (Oxford, England) 20141202 8

<h4>Unlabelled</h4>Current strategies for SNP and INDEL discovery incorporate sequence alignments from multiple individuals to maximize sensitivity and specificity. It is widely accepted that this approach also improves structural variant (SV) detection. However, multisample SV analysis has been stymied by the fundamental difficulties of SV calling, e.g. library insert size variability, SV alignment signal integration and detecting long-range genomic rearrangements involving disjoint loci. Extan ...[more]

PMID: 25527832

Similar Datasets

Project description:BACKGROUND:The MinION Access Program (MAP, 2014-2016) allowed selected users to test the prospects of long nanopore reads for diverse organisms and applications through the rapid development of improving chemistries. In 2014, faced with a fragmented Illumina assembly for the chloroplast genome of the green algal holobiont Caulerpa ashmeadii, we applied to the MAP to test the prospects of nanopore reads to investigate such intricacies, as well as further explore the hologenome of this species with native and hybrid approaches. RESULTS:The chloroplast genome could only be resolved as a circular molecule in nanopore assemblies, which also revealed structural variants (i.e. chloroplast polymorphism or heteroplasmy). Signal and Illumina polishing of nanopore-assembled organelle genomes (chloroplast and mitochondrion) reflected the importance of coverage on final quality and current limitations. In hybrid assembly, our modest nanopore data sets showed encouraging results to improve assembly length, contiguity, repeat content, and binning of the larger nuclear and bacterial genomes. Profiling of the holobiont with nanopore or Illumina data unveiled a dominant Rhodospirillaceae (Alphaproteobacteria) species among six putative endosymbionts. While very fragmented, the cumulative hybrid assembly length of C. ashmeadii's nuclear genome reached 24.4 Mbp, including 2.1 Mbp in repeat, ranging closely with GenomeScope's estimate (> 26.3 Mbp, including 4.8 Mbp in repeat). CONCLUSION:Our findings relying on a very modest number of nanopore R9 reads as compared to current output with newer chemistries demonstrate the promising prospects of the technology for the assembly and profiling of an algal hologenome and resolution of structural variation. The discovery of polymorphic 'chlorotypes' in C. ashmeadii, most likely mediated by homing endonucleases and/or retrohoming by reverse transcriptases, represents the first report of chloroplast heteroplasmy in the siphonous green algae. Improving contiguity of C. ashmeadii's nuclear and bacterial genomes will require deeper nanopore sequencing to greatly increase the coverage of these larger genomic compartments.

Dataset Information

Population-based structural variation discovery with Hydra-Multi.

Unlabelled

Availability and implementation

Contact

Supplementary information

Publications

Population-based structural variation discovery with Hydra-Multi.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets