Unknown

Dataset Information

0

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping.


ABSTRACT: BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. RESULTS: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. CONCLUSIONS: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli's circular genome.

SUBMITTER: English AC 

PROVIDER: S-EPMC4082283 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping.

English Adam C AC   Salerno William J WJ   Reid Jeffrey G JG  

BMC bioinformatics 20140610


<h4>Background</h4>As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to struc  ...[more]

Similar Datasets

| S-EPMC7026727 | biostudies-literature
| S-EPMC5860216 | biostudies-literature
| S-EPMC8463296 | biostudies-literature
| S-EPMC7641771 | biostudies-literature
| S-EPMC7477834 | biostudies-literature
| S-EPMC8491908 | biostudies-literature
| S-EPMC3549798 | biostudies-other
| S-EPMC5514472 | biostudies-literature
| S-EPMC7465313 | biostudies-literature
| S-EPMC8144375 | biostudies-literature