Unknown

Dataset Information

0

ProbCD: enrichment analysis accounting for categorization uncertainty.


ABSTRACT: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test.We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/.We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.

SUBMITTER: Vencio RZ 

PROVIDER: S-EPMC2169266 | biostudies-literature | 2007 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

ProbCD: enrichment analysis accounting for categorization uncertainty.

Vêncio Ricardo Z N RZ   Shmulevich Ilya I  

BMC bioinformatics 20071012


<h4>Background</h4>As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this pro  ...[more]

Similar Datasets

| S-EPMC3260272 | biostudies-literature
| S-EPMC3278194 | biostudies-literature
| S-EPMC4600370 | biostudies-literature
| S-EPMC8059656 | biostudies-literature
| S-EPMC10337143 | biostudies-literature
| S-EPMC3864339 | biostudies-literature
| S-EPMC6058391 | biostudies-literature
| S-EPMC6768727 | biostudies-literature
| S-EPMC7075019 | biostudies-literature
| S-EPMC7297745 | biostudies-literature