Unknown

Dataset Information

0

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites.


ABSTRACT:

Background

Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with ad hoc methods, making them unsuitable for general motif searches.

Results

A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly.

Conclusion

Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized.

SUBMITTER: Kim NK 

PROVIDER: S-EPMC2432075 | biostudies-literature | 2008 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites.

Kim Nak-Kyeong NK   Tharakaraman Kannan K   Mariño-Ramírez Leonardo L   Spouge John L JL  

BMC bioinformatics 20080604


<h4>Background</h4>Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with ad hoc methods, making them unsuitable for general motif searches.<h4>Results</h4>A-GLA  ...[more]

Similar Datasets

| S-EPMC463320 | biostudies-literature
| S-EPMC7868052 | biostudies-literature
| S-EPMC3650864 | biostudies-literature
| S-EPMC3460961 | biostudies-literature
| S-EPMC2588498 | biostudies-literature
| S-EPMC2847231 | biostudies-other
| S-EPMC4846880 | biostudies-literature
| S-EPMC2824720 | biostudies-literature
| S-EPMC2831004 | biostudies-literature
| S-EPMC8595614 | biostudies-literature