Dataset Information

Ranking and combining multiple predictors without labeled data.

ABSTRACT: In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.

SUBMITTER: Parisi F

PROVIDER: S-EPMC3910607 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Ranking and combining multiple predictors without labeled data.

Parisi Fabio F Strino Francesco F Nadler Boaz B Kluger Yuval Y

Proceedings of the National Academy of Sciences of the United States of America 20140113 4

In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) c ...[more]

PMID: 24474744

Dataset Information

Ranking and combining multiple predictors without labeled data.

Publications

Ranking and combining multiple predictors without labeled data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Predicting antibody affinity changes upon mutations by combining multiple predictors.
| S-EPMC7658247 | biostudies-literature

Entropy-based gene ranking without selection bias for the predictive classification of microarray data.
| S-EPMC293475 | biostudies-literature

UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data.
| S-EPMC5198500 | biostudies-literature

Combining Multiple Observational Data Sources to Estimate Causal Effects.
| S-EPMC7571608 | biostudies-literature

Combining primary cohort data with external aggregate information without assuming comparability.
| S-EPMC8166575 | biostudies-literature

Probabilistic HIV recency classification-a logistic regression without labeled individual level training data.
| S-EPMC10577400 | biostudies-literature

Combining multiple imputation and meta-analysis with individual participant data.
| S-EPMC3963448 | biostudies-literature

MetAmp: combining amplicon data from multiple markers for OTU analysis.
| S-EPMC4443678 | biostudies-literature

A spectral method for assessing and combining multiple data visualizations.
| S-EPMC9922271 | biostudies-literature

Design and analysis considerations for combining data from multiple biomarker studies.
| S-EPMC6755899 | biostudies-literature