Dataset Information

Deep whole-genome sequencing of 90 Han Chinese genomes.

ABSTRACT: Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects.

SUBMITTER: Lan T

PROVIDER: S-EPMC5603764 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deep whole-genome sequencing of 90 Han Chinese genomes.

Lan Tianming T Lin Haoxiang H Zhu Wenjuan W Laurent Tellier Christian Asker Melchior TCAM Yang Mengcheng M Liu Xin X Wang Jun J Wang Jian J Yang Huanming H Xu Xun X Guo Xiaosen X

GigaScience 20170901 9

Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal s ...[more]

PMID: 28938720

Dataset Information

Deep whole-genome sequencing of 90 Han Chinese genomes.

Publications

Deep whole-genome sequencing of 90 Han Chinese genomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Deep whole-genome sequencing of 90 Han Chinese genomes
2017-11-18 | PRJEB20820 | EVA

Han Chinese Trio PacBio Whole Genome Sequencing
| PRJEB12236 | ENA

Whole-exome sequencing and genome-wide association studies identify novel sarcopenia risk genes in Han Chinese.
| S-EPMC7434604 | biostudies-literature

Rare variants discovery by extensive whole-genome sequencing of the Han Chinese population in Taiwan: Applications to cardiovascular medicine.
| S-EPMC8132201 | biostudies-literature

Whole-exome sequencing identified four loci influencing craniofacial morphology in northern Han Chinese.
| S-EPMC6554238 | biostudies-literature

Plantagora: modeling whole genome sequencing and assembly of plant genomes.
| S-EPMC3236183 | biostudies-literature

Deep whole-genome sequencing of 100 southeast Asian Malays.
| S-EPMC3542459 | biostudies-literature

Whole Genome Analyses of Chinese Population and De Novo Assembly of A Northern Han Genome.
| S-EPMC6818495 | biostudies-literature

Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.
| S-EPMC5530312 | biostudies-literature

Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms.
| S-EPMC6911065 | biostudies-literature