Unknown

Dataset Information

0

Characterization of structural variants with single molecule and hybrid sequencing approaches.


ABSTRACT: Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly.

SUBMITTER: Ritz A 

PROVIDER: S-EPMC4253835 | biostudies-literature | 2014 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Characterization of structural variants with single molecule and hybrid sequencing approaches.

Ritz Anna A   Bashir Ali A   Sindi Suzanne S   Hsu David D   Hajirasouliha Iman I   Raphael Benjamin J BJ  

Bioinformatics (Oxford, England) 20141028 24


<h4>Motivation</h4>Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates.<h4>Results</h4>We prese  ...[more]

Similar Datasets

| S-EPMC7545150 | biostudies-literature
| S-EPMC5411774 | biostudies-literature
2018-10-03 | GSE105112 | GEO
| S-EPMC3707490 | biostudies-literature
| S-EPMC5990442 | biostudies-literature
| S-EPMC9493964 | biostudies-literature
| S-EPMC3912422 | biostudies-literature
| PRJNA414663 | ENA
| S-EPMC4022347 | biostudies-literature
| S-EPMC10324673 | biostudies-literature