Unknown

Dataset Information

0

Remote homology search with hidden Potts models.


ABSTRACT: Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.

SUBMITTER: Wilburn GW 

PROVIDER: S-EPMC7728182 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Remote homology search with hidden Potts models.

Wilburn Grey W GW   Eddy Sean R SR  

PLoS computational biology 20201130 11


Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what  ...[more]

Similar Datasets

| S-EPMC6797059 | biostudies-literature
| S-EPMC7514434 | biostudies-literature
2021-07-21 | GSE179646 | GEO
| S-EPMC7875865 | biostudies-literature
2024-07-29 | GSE272969 | GEO
2022-03-04 | GSE189259 | GEO
| S-EPMC1933213 | biostudies-literature
2021-07-21 | GSE179641 | GEO
2021-07-21 | GSE179638 | GEO
2013-03-21 | E-GEOD-44844 | biostudies-arrayexpress