Unknown

Dataset Information

0

Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework.


ABSTRACT:

Background

Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome.

Results

Using the yeast Saccharomyces cerevisiae as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA.

Conclusion

With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.

SUBMITTER: Gilchrist MA 

PROVIDER: S-EPMC2217564 | biostudies-literature | 2007 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework.

Gilchrist Michael A MA   Qin Hong H   Zaretzki Russell R  

BMC bioinformatics 20071018


<h4>Background</h4>Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome.<h4>Results</h4>Usi  ...[more]

Similar Datasets

| S-EPMC4846590 | biostudies-other
| S-EPMC4083135 | biostudies-literature
| S-EPMC2367475 | biostudies-literature
| S-EPMC534621 | biostudies-literature
| S-EPMC8282019 | biostudies-literature
| S-EPMC8412351 | biostudies-literature
| S-EPMC6824204 | biostudies-literature
| S-EPMC6245789 | biostudies-literature
| S-EPMC535903 | biostudies-literature
| S-EPMC517707 | biostudies-literature