Unknown

Dataset Information

0

Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.


ABSTRACT: MOTIVATION:Reconstruction of haplotypes for human genomes is an important problem in medical and population genetics. Hi-C sequencing generates read pairs with long-range haplotype information that can be computationally assembled to generate chromosome-spanning haplotypes. However, the haplotypes have limited completeness and low accuracy. Haplotype information from population reference panels can potentially be used to improve the completeness and accuracy of Hi-C haplotyping. RESULTS:In this paper, we describe a likelihood based method to integrate short-range haplotype information from a population reference panel of haplotypes with the long-range haplotype information present in sequence reads from methods such as Hi-C to assemble dense and highly accurate haplotypes for individual genomes. Our method leverages a statistical phasing method and a maximum spanning tree algorithm to determine the optimal second-order approximation of the population-based haplotype likelihood for an individual genome. The population-based likelihood is encoded using pseudo-reads which are then used as input along with sequence reads for haplotype assembly using an existing tool, HapCUT2. Using whole-genome Hi-C data for two human genomes (NA19240 and NA12878), we demonstrate that this integrated phasing method enables the phasing of 97-98% of variants, reduces the switch error rates by 3-6-fold, and outperforms an existing method for combining phase information from sequence reads with population-based phasing. On Strand-seq data for NA12878, our method improves the haplotype completeness from 71.4 to 94.6% and reduces the switch error rate 2-fold, demonstrating its utility for phasing using multiple sequencing technologies. AVAILABILITY AND IMPLEMENTATION:Code and datasets are available at https://github.com/vibansal/IntegratedPhasing.

SUBMITTER: Bansal V 

PROVIDER: S-EPMC6612846 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.

Bansal Vikas V  

Bioinformatics (Oxford, England) 20190701 14


<h4>Motivation</h4>Reconstruction of haplotypes for human genomes is an important problem in medical and population genetics. Hi-C sequencing generates read pairs with long-range haplotype information that can be computationally assembled to generate chromosome-spanning haplotypes. However, the haplotypes have limited completeness and low accuracy. Haplotype information from population reference panels can potentially be used to improve the completeness and accuracy of Hi-C haplotyping.<h4>Resul  ...[more]

Similar Datasets

| S-EPMC5670131 | biostudies-literature
| S-EPMC4162929 | biostudies-literature
| S-EPMC4786454 | biostudies-literature
| S-EPMC6214236 | biostudies-literature
| S-EPMC403814 | biostudies-literature
| S-EPMC4937318 | biostudies-literature
| S-EPMC4756330 | biostudies-literature
| S-EPMC7673114 | biostudies-literature
| S-EPMC7150542 | biostudies-literature
| S-EPMC3299995 | biostudies-literature