Dataset Information

ProbCD: enrichment analysis accounting for categorization uncertainty.

ABSTRACT: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test.We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/.We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.

SUBMITTER: Vencio RZ

PROVIDER: S-EPMC2169266 | biostudies-literature | 2007 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

ProbCD: enrichment analysis accounting for categorization uncertainty.

Vêncio Ricardo Z N RZ Shmulevich Ilya I

BMC bioinformatics 20071012

<h4>Background</h4>As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this pro ...[more]

PMID: 17935624

Similar Datasets

Project description:BackgroundCharacterizing animal space use is critical for understanding ecological relationships. Animal telemetry technology has revolutionized the fields of ecology and conservation biology by providing high quality spatial data on animal movement. Radio-telemetry with very high frequency (VHF) radio signals continues to be a useful technology because of its low cost, miniaturization, and low battery requirements. Despite a number of statistical developments synthetically integrating animal location estimation and uncertainty with spatial process models using satellite telemetry data, we are unaware of similar developments for azimuthal telemetry data. As such, there are few statistical options to handle these unique data and no synthetic framework for modeling animal location uncertainty and accounting for it in ecological models.We developed a hierarchical modeling framework to provide robust animal location estimates from one or more intersecting or non-intersecting azimuths. We used our azimuthal telemetry model (ATM) to account for azimuthal uncertainty with covariates and propagate location uncertainty into spatial ecological models. We evaluate the ATM with commonly used estimators (Lenth (1981) maximum likelihood and M-Estimators) using simulation. We also provide illustrative empirical examples, demonstrating the impact of ignoring location uncertainty within home range and resource selection analyses. We further use simulation to better understand the relationship among location uncertainty, spatial covariate autocorrelation, and resource selection inference.ResultsWe found the ATM to have good performance in estimating locations and the only model that has appropriate measures of coverage. Ignoring animal location uncertainty when estimating resource selection or home ranges can have pernicious effects on ecological inference. Home range estimates can be overly confident and conservative when ignoring location uncertainty and resource selection coefficients can lead to incorrect inference and over confidence in the magnitude of selection. Furthermore, our simulation study clarified that incorporating location uncertainty helps reduce bias in resource selection coefficients across all levels of covariate spatial autocorrelation.ConclusionThe ATM can accommodate one or more azimuths when estimating animal locations, regardless of how they intersect; this ensures that all data collected are used for ecological inference. Our findings and model development have important implications for interpreting historical analyses using this type of data and the future design of radio-telemetry studies.

Dataset Information

ProbCD: enrichment analysis accounting for categorization uncertainty.

Publications

ProbCD: enrichment analysis accounting for categorization uncertainty.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets