Unknown

Dataset Information

0

An expanded sequence context model broadly explains variability in polymorphism levels across the human genome.


ABSTRACT: The rate of single-nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediately flanking nucleotides around a polymorphic site--the site's trinucleotide sequence context--to study polymorphism levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, highlighting new mutation-promoting motifs at ApT dinucleotide, CAAT and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases.

SUBMITTER: Aggarwala V 

PROVIDER: S-EPMC4811712 | biostudies-literature | 2016 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

An expanded sequence context model broadly explains variability in polymorphism levels across the human genome.

Aggarwala Varun V   Voight Benjamin F BF  

Nature genetics 20160215 4


The rate of single-nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediately flanking nucleotides around a polymorphic site--the site's trinucleotide sequence context--to study polymorphism levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism.  ...[more]

Similar Datasets

| S-EPMC6339776 | biostudies-literature
| S-EPMC3923675 | biostudies-literature
| S-EPMC1449538 | biostudies-literature
| S-EPMC4422588 | biostudies-literature
| S-EPMC5536394 | biostudies-literature
| S-EPMC2678550 | biostudies-literature
| S-EPMC5980329 | biostudies-literature
| S-EPMC6961574 | biostudies-literature
| S-EPMC5142776 | biostudies-literature
| S-EPMC6501879 | biostudies-literature