Unknown

Dataset Information

0

Making sense out of massive data by going beyond differential expression.


ABSTRACT: With the rapid growth of publicly available high-throughput transcriptomic data, there is increasing recognition that large sets of such data can be mined to better understand disease states and mechanisms. Prior gene expression analyses, both large and small, have been dichotomous in nature, in which phenotypes are compared using clearly defined controls. Such approaches may require arbitrary decisions about what are considered "normal" phenotypes, and what each phenotype should be compared to. Instead, we adopt a holistic approach in which we characterize phenotypes in the context of a myriad of tissues and diseases. We introduce scalable methods that associate expression patterns to phenotypes in order both to assign phenotype labels to new expression samples and to select phenotypically meaningful gene signatures. By using a nonparametric statistical approach, we identify signatures that are more precise than those from existing approaches and accurately reveal biological processes that are hidden in case vs. control studies. Employing a comprehensive perspective on expression, we show how metastasized tumor samples localize in the vicinity of the primary site counterparts and are overenriched for those phenotype labels. We find that our approach provides insights into the biological processes that underlie differences between tissues and diseases beyond those identified by traditional differential expression analyses. Finally, we provide an online resource (http://concordia.csail.mit.edu) for mapping users' gene expression samples onto the expression landscape of tissue and disease.

SUBMITTER: Schmid PR 

PROVIDER: S-EPMC3326474 | biostudies-literature | 2012 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Making sense out of massive data by going beyond differential expression.

Schmid Patrick R PR   Palmer Nathan P NP   Kohane Isaac S IS   Berger Bonnie B  

Proceedings of the National Academy of Sciences of the United States of America 20120323 15


With the rapid growth of publicly available high-throughput transcriptomic data, there is increasing recognition that large sets of such data can be mined to better understand disease states and mechanisms. Prior gene expression analyses, both large and small, have been dichotomous in nature, in which phenotypes are compared using clearly defined controls. Such approaches may require arbitrary decisions about what are considered "normal" phenotypes, and what each phenotype should be compared to.  ...[more]

Similar Datasets

| S-EPMC4411791 | biostudies-literature
| S-EPMC6698701 | biostudies-literature
| S-EPMC6390500 | biostudies-literature
| S-EPMC6189483 | biostudies-literature
| S-EPMC4890338 | biostudies-literature
| S-EPMC4021544 | biostudies-literature
| S-EPMC7286920 | biostudies-literature
| S-EPMC7401704 | biostudies-literature
| S-EPMC7761970 | biostudies-literature
| S-EPMC5992967 | biostudies-literature