Dataset Information

Data smashing: uncovering lurking order in data.

ABSTRACT: From automatic speech recognition to discovering unusual stars, underlying almost all automated discovery tasks is the ability to compare and contrast data streams with each other, to identify connections and spot outliers. Despite the prevalence of data, however, automated methods are not keeping pace. A key bottleneck is that most data comparison algorithms today rely on a human expert to specify what 'features' of the data are relevant for comparison. Here, we propose a new principle for estimating the similarity between the sources of arbitrary data streams, using neither domain knowledge nor learning. We demonstrate the application of this principle to the analysis of data from a number of real-world challenging problems, including the disambiguation of electro-encephalograph patterns pertaining to epileptic seizures, detection of anomalous cardiac activity from heart sound recordings and classification of astronomical objects from raw photometry. In all these cases and without access to any domain knowledge, we demonstrate performance on a par with the accuracy achieved by specialized algorithms and heuristics devised by domain experts. We suggest that data smashing principles may open the door to understanding increasingly complex observations, especially when experts do not know what to look for.

SUBMITTER: Chattopadhyay I

PROVIDER: S-EPMC4223903 | biostudies-other | 2014 Dec

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Data smashing: uncovering lurking order in data.

Chattopadhyay Ishanu I Lipson Hod H

Journal of the Royal Society, Interface 20141201 101

From automatic speech recognition to discovering unusual stars, underlying almost all automated discovery tasks is the ability to compare and contrast data streams with each other, to identify connections and spot outliers. Despite the prevalence of data, however, automated methods are not keeping pace. A key bottleneck is that most data comparison algorithms today rely on a human expert to specify what 'features' of the data are relevant for comparison. Here, we propose a new principle for esti ...[more]

PMID: 25401180

Dataset Information

Data smashing: uncovering lurking order in data.

Publications

Data smashing: uncovering lurking order in data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Uncovering the subtype-specific temporal order of cancer pathway dysregulation.
| S-EPMC6872169 | biostudies-literature

Parenclitic networks: uncovering new functions in biological data.
| S-EPMC4037713 | biostudies-other

Uncovering hidden duplicated content in public transcriptomics data.
| S-EPMC3595988 | biostudies-literature

Uncovering Effective Explanations for Interactive Genomic Data Analysis.
| S-EPMC7660438 | biostudies-literature

Imaging genomics: data fusion in uncovering disease heritability.
| S-EPMC10507799 | biostudies-literature

Uncovering phase-coupled oscillatory networks in electrophysiological data.
| S-EPMC6869115 | biostudies-literature

Assessing phenotype order in molecular data.
| S-EPMC6692304 | biostudies-literature

Uncovering the rules for protein-protein interactions from yeast genomic data.
| S-EPMC2656152 | biostudies-other

Simplivariate models: uncovering the underlying biology in functional genomics data.
| S-EPMC3116836 | biostudies-literature

Bi-order multimodal integration of single-cell data.
| S-EPMC9082907 | biostudies-literature