Unknown

Dataset Information

0

Detecting novel associations in large data sets.


ABSTRACT: Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

SUBMITTER: Reshef DN 

PROVIDER: S-EPMC3325791 | biostudies-other | 2011 Dec

REPOSITORIES: biostudies-other

altmetric image

Publications


Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal inform  ...[more]

Similar Datasets

| S-EPMC5161273 | biostudies-literature
| S-EPMC3630015 | biostudies-literature
| S-EPMC2375126 | biostudies-literature
| S-EPMC1800870 | biostudies-literature
| S-EPMC4174433 | biostudies-literature
| S-EPMC3976248 | biostudies-literature
| S-EPMC8631639 | biostudies-literature
| S-EPMC6881151 | biostudies-literature
| S-EPMC3185442 | biostudies-literature