Unknown

Dataset Information

0

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.


ABSTRACT: T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom ?, with ? ? ? corresponding to SNE and ? = 1 corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that ? < 1 can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.

SUBMITTER: Kobak D 

PROVIDER: S-EPMC7582035 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.

Kobak Dmitry D   Linderman George G   Steinerberger Stefan S   Kluger Yuval Y   Berens Philipp P  

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) 20200430


T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the 'crowding problem' of SNE. Here, we develop an efficient implementation of t-SNE for a t-distribution kernel with an arbitrary degree of freedom <i>ν</i>, with <i>ν</i> → ∞ corresponding to SNE and <i>ν</i> = 1 corresponding to the standard t  ...[more]

Similar Datasets

| S-EPMC10010455 | biostudies-literature
| S-EPMC1125203 | biostudies-literature
| S-EPMC3906378 | biostudies-literature
| S-EPMC6334642 | biostudies-other
| S-EPMC8020077 | biostudies-literature
| S-EPMC3574063 | biostudies-literature
| S-EPMC7451580 | biostudies-literature
| S-EPMC5621483 | biostudies-literature
| S-EPMC10786257 | biostudies-literature
| S-EPMC8933315 | biostudies-literature