Unknown

Dataset Information

0

Log-odds sequence logos.


ABSTRACT: MOTIVATION:DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns. RESULTS:We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column's observations are better explained as arising from relatedness or chance. Within this framework, we propose distinct normalized maximum likelihood and Bayesian measures of column information. We illustrate these measures on High Mobility Group B (HMGB) box proteins and a dataset of enzyme alignments. Particularly in the context of protein alignments, our measures improve the discrimination of biologically relevant positions. AVAILABILITY AND IMPLEMENTATION:Our new measures are implemented in an open-source Web-based logo generation program, which is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/logoddslogo/index.html. A stand-alone version of the program is also available from this site. CONTACT:altschul@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Yu YK 

PROVIDER: S-EPMC4318935 | biostudies-literature | 2015 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Log-odds sequence logos.

Yu Yi-Kuo YK   Capra John A JA   Stojmirović Aleksandar A   Landsman David D   Altschul Stephen F SF  

Bioinformatics (Oxford, England) 20141006 3


<h4>Motivation</h4>DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns.<h4>Results</h4>We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column's observations are better explained as arising from relatedne  ...[more]

Similar Datasets

| S-EPMC2904766 | biostudies-literature
| S-EPMC5867187 | biostudies-other
| S-EPMC7053742 | biostudies-literature
| S-EPMC7141850 | biostudies-literature
| S-EPMC4834280 | biostudies-literature
| S-EPMC4339682 | biostudies-literature
| S-EPMC4155610 | biostudies-literature
| S-EPMC3828135 | biostudies-literature
| S-EPMC7427861 | biostudies-literature
| S-EPMC6416084 | biostudies-literature