Dataset Information

Accurate and exact CNV identification from targeted high-throughput sequence data.

ABSTRACT:

Background

Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.

Results

Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate.

Conclusions

Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

SUBMITTER: Nord AS

PROVIDER: S-EPMC3088570 | biostudies-literature | 2011 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Accurate and exact CNV identification from targeted high-throughput sequence data.

Nord Alex S AS Lee Ming M King Mary-Claire MC Walsh Tom T

BMC genomics 20110412

<h4>Background</h4>Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data.<h4>Results</h4>Sequencing data is first scanned for gains a ...[more]

PMID: 21486468

Similar Datasets

Project description:BackgroundTemperate phages influence the density, diversity and function of bacterial populations. Historically, they have been described as carriers of toxins. More recently, they have also been recognised as direct modulators of the gut microbiome, and indirectly of host health and disease. Despite recent advances in studying prophages using non-targeted sequencing approaches, methodological challenges in identifying inducible prophages in bacterial genomes and quantifying their activity have limited our understanding of prophage-host interactions.ResultsWe present methods for using high-throughput sequencing data to locate inducible prophages, including those previously undiscovered, to quantify prophage activity and to investigate their replication. We first used the well-established Salmonella enterica serovar Typhimurium/p22 system to validate our methods for (i) quantifying phage-to-host ratios and (ii) accurately locating inducible prophages in the reference genome based on phage-to-host ratio differences and read alignment alterations between induced and non-induced prophages. Investigating prophages in bacterial strains from a murine gut model microbiota known as Oligo-MM12 or sDMDMm2, we located five novel inducible prophages in three strains, quantified their activity and showed signatures of lateral transduction potential for two of them. Furthermore, we show that the methods were also applicable to metagenomes of induced faecal samples from Oligo-MM12 mice, including for strains with a relative abundance below 1%, illustrating its potential for the discovery of inducible prophages also in more complex metagenomes. Finally, we show that predictions of prophage locations in reference genomes of the strains we studied were variable and inconsistent for four bioinformatic tools we tested, which highlights the importance of their experimental validation.ConclusionsThis study demonstrates that the integration of experimental induction and bioinformatic analysis presented here is a powerful approach to accurately locate inducible prophages using high-throughput sequencing data and to quantify their activity. The ability to generate such quantitative information will be critical in helping us to gain better insights into the factors that determine phage activity and how prophage-bacteria interactions influence our microbiome and impact human health. Video abstract.

Dataset Information

Accurate and exact CNV identification from targeted high-throughput sequence data.

Background

Results

Conclusions

Publications

Accurate and exact CNV identification from targeted high-throughput sequence data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets