Unknown

Dataset Information

0

Semblance: An empirical similarity kernel on probability spaces.


ABSTRACT: In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data's underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of Semblance lies in its distribution-free formulation and its ability to place greater emphasis on proximity between observation pairs that fall at the outskirts of the data distribution, as opposed to those toward the center. Semblance is a valid Mercer kernel, allowing its principled use in kernel-based learning algorithms, and for any data modality. We demonstrate its consistently improved performance against conventional methods through simulations and real case studies from diverse applications in single-cell transcriptomics, image reconstruction, and financial forecasting.

SUBMITTER: Agarwal D 

PROVIDER: S-EPMC6892634 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Semblance: An empirical similarity kernel on probability spaces.

Agarwal Divyansh D   Zhang Nancy R NR  

Science advances 20191204 12


In data science, determining proximity between observations is critical to many downstream analyses such as clustering, classification, and prediction. However, when the data's underlying probability distribution is unclear, the function used to compute similarity between data points is often arbitrarily chosen. Here, we present a novel definition of proximity, Semblance, that uses the empirical distribution of a feature to inform the pair-wise similarity between observations. The advantage of S  ...[more]

Similar Datasets

| PRJNA476369 | ENA
| S-EPMC4208055 | biostudies-literature
| S-EPMC6742806 | biostudies-literature
| S-EPMC7758075 | biostudies-literature
| S-EPMC8792612 | biostudies-literature
| S-EPMC3161911 | biostudies-literature
| S-EPMC6295467 | biostudies-literature
| S-EPMC8497014 | biostudies-literature
| S-EPMC3272581 | biostudies-literature
| S-EPMC7612173 | biostudies-literature