Dataset Information

Centroid estimation in discrete high-dimensional spaces with applications in biology.

ABSTRACT: Maximum likelihood estimators and other direct optimization-based estimators dominated statistical estimation and prediction for decades. Yet, the principled foundations supporting their dominance do not apply to the discrete high-dimensional inference problems of the 21st century. As it is well known, statistical decision theory shows that maximum likelihood and related estimators use data only to identify the single most probable solution. Accordingly, unless this one solution so dominates the immense ensemble of all solutions that its probability is near one, there is no principled reason to expect such an estimator to be representative of the posterior-weighted ensemble of solutions, and thus represent inferences drawn from the data. We employ statistical decision theory to find more representative estimators, centroid estimators, in a general high-dimensional discrete setting by using a family of loss functions with penalties that increase with the number of differences in components. We show that centroid estimates are obtained by maximizing the marginal probabilities of the solution components for unconstrained ensembles and for an important class of problems, including sequence alignment and the prediction of RNA secondary structure, whose ensembles contain exclusivity constraints. Three genomics examples are described that show that these estimators substantially improve predictions of ground-truth reference sets.

SUBMITTER: Carvalho LE

PROVIDER: S-EPMC2265131 | biostudies-other | 2008 Mar

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Centroid estimation in discrete high-dimensional spaces with applications in biology.

Carvalho Luis E LE Lawrence Charles E CE

Proceedings of the National Academy of Sciences of the United States of America 20080227 9

Maximum likelihood estimators and other direct optimization-based estimators dominated statistical estimation and prediction for decades. Yet, the principled foundations supporting their dominance do not apply to the discrete high-dimensional inference problems of the 21st century. As it is well known, statistical decision theory shows that maximum likelihood and related estimators use data only to identify the single most probable solution. Accordingly, unless this one solution so dominates the ...[more]

PMID: 18305160

Dataset Information

Centroid estimation in discrete high-dimensional spaces with applications in biology.

Publications

Centroid estimation in discrete high-dimensional spaces with applications in biology.

OmicsDI is part of the ELIXIR infrastructure

Tweets