Dataset Information

HICANCER: accurate and complete cancer genome phasing with Hi-C reads.

ABSTRACT: Due to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HICANCER. HICANCER solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HICANCER is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.

SUBMITTER: Pan W

PROVIDER: S-EPMC7987978 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

HICANCER: accurate and complete cancer genome phasing with Hi-C reads.

Pan Weihua W Gong Desheng D Sun Da D Luo Haohui H

Scientific reports 20210323 1

Due to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in re ...[more]

PMID: 33758310

Dataset Information

HICANCER: accurate and complete cancer genome phasing with Hi-C reads.

Publications

HICANCER: accurate and complete cancer genome phasing with Hi-C reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Haplotype threading: accurate polyploid phasing from long reads.
| S-EPMC7504856 | biostudies-literature

Fast and accurate mapping of long reads to complete genome assemblies with VerityMap.
| S-EPMC9808623 | biostudies-literature

Accurate genome-wide phasing from IBD data.
| S-EPMC9686111 | biostudies-literature

Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C.
| S-EPMC8081726 | biostudies-literature

Accurate genome relative abundance estimation based on shotgun metagenomic reads.
| S-EPMC3232206 | biostudies-literature

Complete genome phasing of family quartet by combination of genetic, physical and population-based phasing analysis.
| S-EPMC3669306 | biostudies-literature

Accurate indel prediction using paired-end short reads.
| S-EPMC3614465 | biostudies-other

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads.
| S-EPMC8092372 | biostudies-literature

Fast and accurate de novo genome assembly from long uncorrected reads.
| S-EPMC5411768 | biostudies-literature

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.
| S-EPMC3696580 | biostudies-literature