Unknown

Dataset Information

0

Gaussian Embedding for Large-scale Gene Set Analysis.


ABSTRACT: Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based gene set embedding approach, which represents each gene set as a multivariate Gaussian distribution rather than a single point in the low-dimensional space, according to the proximity of these genes in a protein-protein interaction network. We demonstrate that Set2Gaussian improves gene set member identification, accurately stratifies tumors, and finds concise gene sets for gene set enrichment analysis. We further show how Set2Gaussian allows us to identify a previously unknown clinical prognostic and predictive subnetwork around NEFM in sarcoma, which we validate in independent cohorts.

SUBMITTER: Wang S 

PROVIDER: S-EPMC7505077 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Gaussian Embedding for Large-scale Gene Set Analysis.

Wang Sheng S   Flynn Emily R ER   Altman Russ B RB  

Nature machine intelligence 20200615 7


Gene sets, including protein complexes and signaling pathways, have proliferated greatly, in large part as a result of high-throughput biological data. Leveraging gene sets to gain insight into biological discovery requires computational methods for converting them into a useful form for available machine learning models. Here, we study the problem of embedding gene sets as compact features that are compatible with available machine learning codes. We present Set2Gaussian, a novel network-based  ...[more]

Similar Datasets

| S-EPMC2808166 | biostudies-literature
| S-EPMC3029239 | biostudies-literature
| S-EPMC6717532 | biostudies-literature
| S-EPMC4493826 | biostudies-literature
| S-EPMC5988488 | biostudies-literature
| S-EPMC9825773 | biostudies-literature
| S-EPMC2887944 | biostudies-literature
| S-EPMC5802054 | biostudies-other
| S-EPMC7351785 | biostudies-literature
| S-EPMC7473573 | biostudies-literature