Dataset Information

The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals.

ABSTRACT: Metabolic diseases are the most common and rapidly growing health issues worldwide. The massive population-based human genetics is crucial for the precise prevention and intervention of metabolic disorders. The China Metabolic Analytics Project (ChinaMAP) is based on cohort studies across diverse regions and ethnic groups with metabolic phenotypic data in China. Here, we describe the centralized analysis of the deep whole genome sequencing data and the genetic bases of metabolic traits in 10,588 individuals from the ChinaMAP. The frequency spectrum of variants, population structure, pathogenic variants and novel genomic characteristics were analyzed. The individual genetic evaluations of Mendelian diseases, nutrition and drug metabolism, and traits of blood glucose and BMI were integrated. Our study establishes a large-scale and deep resource for the genetics of East Asians and provides opportunities for novel genetic discoveries of metabolic characteristics and disorders.

SUBMITTER: Cao Y

PROVIDER: S-EPMC7609296 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals.

Cao Yanan Y Li Lin L Xu Min M Feng Zhimin Z Sun Xiaohui X Lu Jieli J Xu Yu Y Du Peina P Wang Tiange T Hu Ruying R Ye Zhen Z Shi Lixin L Tang Xulei X Yan Li L Gao Zhengnan Z Chen Gang G Zhang Yinfei Y Chen Lulu L Ning Guang G Bi Yufang Y Wang Weiqing W

Cell research 20200430 9

Metabolic diseases are the most common and rapidly growing health issues worldwide. The massive population-based human genetics is crucial for the precise prevention and intervention of metabolic disorders. The China Metabolic Analytics Project (ChinaMAP) is based on cohort studies across diverse regions and ethnic groups with metabolic phenotypic data in China. Here, we describe the centralized analysis of the deep whole genome sequencing data and the genetic bases of metabolic traits in 10,588 ...[more]

PMID: 32355288

Similar Datasets

Project description:BackgroundPhylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called delta value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study.ResultsOur results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. delta values are found to be a reliable predictor of phylogenetic accuracy.ConclusionUsing the most treelike distance matrices, as judged by their delta values, distance methods are able to recover all major plant lineages, and are more in accordance with Apicomplexa organelles being derived from "green" plastids than from plastids of the "red" type. GBDP-like methods can be used to reliably infer phylogenies from different kinds of genomic data. A framework is established to further develop and improve such methods. delta values are a topology-independent tool of general use for the development and assessment of distance methods for phylogenetic inference.

Dataset Information

The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals.

Publications

The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets