Unknown

Dataset Information

0

Systematic discovery of conservation states for single-nucleotide annotation of the human genome.


ABSTRACT: Comparative genomics sequence data is an important source of information for interpreting genomes. Genome-wide annotations based on this data have largely focused on univariate scores or binary elements of evolutionary constraint. Here we present a complementary whole genome annotation approach, ConsHMM, which applies a multivariate hidden Markov model to learn de novo 'conservation states' based on the combinatorial and spatial patterns of which species align to and match a reference genome in a multiple species DNA sequence alignment. We applied ConsHMM to a 100-way vertebrate sequence alignment to annotate the human genome at single nucleotide resolution into 100 conservation states. These states have distinct enrichments for other genomic information including gene annotations, chromatin states, repeat families, and bases prioritized by various variant prioritization scores. Constrained elements have distinct heritability partitioning enrichments depending on their conservation state assignment. ConsHMM conservation states are a resource for analyzing genomes and genetic variants.

SUBMITTER: Arneson A 

PROVIDER: S-EPMC6606595 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Systematic discovery of conservation states for single-nucleotide annotation of the human genome.

Arneson Adriana A   Ernst Jason J  

Communications biology 20190702


Comparative genomics sequence data is an important source of information for interpreting genomes. Genome-wide annotations based on this data have largely focused on univariate scores or binary elements of evolutionary constraint. Here we present a complementary whole genome annotation approach, ConsHMM, which applies a multivariate hidden Markov model to learn de novo 'conservation states' based on the combinatorial and spatial patterns of which species align to and match a reference genome in  ...[more]

Similar Datasets

| S-EPMC8175581 | biostudies-literature
| S-EPMC7373132 | biostudies-literature
| S-EPMC2919626 | biostudies-literature
2019-07-24 | GSE129423 | GEO
2019-07-24 | GSE129422 | GEO
2019-07-24 | GSE129421 | GEO
2019-07-24 | GSE129420 | GEO
2019-07-24 | GSE129419 | GEO
2019-07-24 | GSE129418 | GEO
2019-07-24 | GSE129417 | GEO