Dataset Information

Analysis of protein-coding genetic variation in 60,706 humans.

ABSTRACT: Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.

SUBMITTER: Lek M

PROVIDER: S-EPMC5018207 | biostudies-literature | 2016 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Analysis of protein-coding genetic variation in 60,706 humans.

Lek Monkol M Karczewski Konrad J KJ Minikel Eric V EV Samocha Kaitlin E KE Banks Eric E Fennell Timothy T O'Donnell-Luria Anne H AH Ware James S JS Hill Andrew J AJ Cummings Beryl B BB Tukiainen Taru T Birnbaum Daniel P DP Kosmicki Jack A JA Duncan Laramie E LE Estrada Karol K Zhao Fengmei F Zou James J Pierce-Hoffman Emma E Berghout Joanne J Cooper David N DN Deflaux Nicole N DePristo Mark M Do Ron R Flannick Jason J Fromer Menachem M Gauthier Laura L Goldstein Jackie J Gupta Namrata N Howrigan Daniel D Kiezun Adam A Kurki Mitja I MI Moonshine Ami Levy AL Natarajan Pradeep P Orozco Lorena L Peloso Gina M GM Poplin Ryan R Rivas Manuel A MA Ruano-Rubio Valentin V Rose Samuel A SA Ruderfer Douglas M DM Shakir Khalid K Stenson Peter D PD Stevens Christine C Thomas Brett P BP Tiao Grace G Tusie-Luna Maria T MT Weisburd Ben B Won Hong-Hee HH Yu Dongmei D Altshuler David M DM Ardissino Diego D Boehnke Michael M Danesh John J Donnelly Stacey S Elosua Roberto R Florez Jose C JC Gabriel Stacey B SB Getz Gad G Glatt Stephen J SJ Hultman Christina M CM Kathiresan Sekar S Laakso Markku M McCarroll Steven S McCarthy Mark I MI McGovern Dermot D McPherson Ruth R Neale Benjamin M BM Palotie Aarno A Purcell Shaun M SM Saleheen Danish D Scharf Jeremiah M JM Sklar Pamela P Sullivan Patrick F PF Tuomilehto Jaakko J Tsuang Ming T MT Watkins Hugh C HC Wilson James G JG Daly Mark J MJ MacArthur Daniel G DG

Nature 20160801 7616

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence fo ...[more]

PMID: 27535533

Dataset Information

Analysis of protein-coding genetic variation in 60,706 humans.

Publications

Analysis of protein-coding genetic variation in 60,706 humans.

OmicsDI is part of the ELIXIR infrastructure

Tweets