Dataset Information

Building a Chinese pan-genome of 486 individuals.

ABSTRACT: Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that the common sequence size is uncapped with a growing population. The 46.646 Mbp common sequences obtained from the 486 individuals improved the accuracy of variant calling and mapping rate when added to the reference genome. We also analyzed the genomic positions of these common sequences and found that they came from genomic regions characterized by high mutation rate and low pathogenicity. Our study authenticates the Chinese pan-genome as representative of DNA sequences specific to the Han Chinese population missing from the GRCh38 reference genome and establishes the newly defined common sequences as candidates to supplement the current human reference.

SUBMITTER: Li Q

PROVIDER: S-EPMC8405635 | biostudies-literature | 2021 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Building a Chinese pan-genome of 486 individuals.

Li Qiuhui Q Tian Shilin S Yan Bin B Liu Chi Man CM Lam Tak-Wah TW Li Ruiqiang R Luo Ruibang R

Communications biology 20210830 1

Pan-genome sequence analysis of human population ancestry is critical for expanding and better defining human genome sequence diversity. However, the amount of genetic variation still missing from current human reference sequences is still unknown. Here, we used 486 deep-sequenced Han Chinese genomes to identify 276 Mbp of DNA sequences that, to our knowledge, are absent in the current human reference. We classified these sequences into individual-specific and common sequences, and propose that ...[more]

PMID: 34462542

Dataset Information

Building a Chinese pan-genome of 486 individuals.

Publications

Building a Chinese pan-genome of 486 individuals.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Building a pan-genome reference for a population.
| S-EPMC4424974 | biostudies-literature

Toward building a comprehensive human pan-genome: The SEN-GENOME project.
| S-EPMC11480787 | biostudies-literature

Pan-genome analysis of three main Chinese chestnut varieties.
| S-EPMC9358723 | biostudies-literature

Insights into the pan-microbiome: skin microbial communities of Chinese individuals differ from other racial groups.
| S-EPMC4503953 | biostudies-literature

The Salmonella enterica pan-genome.
| S-EPMC3175032 | biostudies-literature

Genome-Wide Polygenic Risk Score for Predicting High Risk Glaucoma Individuals of Han Chinese Ancestry.
| S-EPMC8618593 | biostudies-literature

Improving pan-genome annotation using whole genome multiple alignment.
| S-EPMC3142524 | biostudies-literature

panX: pan-genome analysis and exploration.
| S-EPMC5758898 | biostudies-literature

Unraveling Allelic Impacts on Pre-Harvest Sprouting Resistance in <i>TaVP1</i>-B of Chinese Wheat Accessions Using Pan-Genome.
| S-EPMC11859669 | biostudies-literature