Unknown

Dataset Information

0

Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data.


ABSTRACT: The accumulation of RNA sequencing (RNA-Seq) gene expression data in recent years has resulted in large and complex data sets of high dimensions. Exploratory analysis, including data mining and visualization, reveals hidden patterns and potential outliers in such data, but is often challenged by the high dimensional nature of the data. The scatterplot matrix is a commonly used tool for visualizing multivariate data, and allows us to view multiple bivariate relationships simultaneously. However, the scatterplot matrix becomes less effective for high dimensional data because the number of bivariate displays increases quadratically with data dimensionality. In this study, we introduce a selection criterion for each bivariate scatterplot and design/implement an algorithm that automatically scan and rank all possible scatterplots, with the goal of identifying the plots in which separation between two pre-defined groups is maximized. By applying our method to a multi-experiment Arabidopsis RNA-Seq data set, we were able to successfully pinpoint the visualization angles where genes from two biological pathways are the most separated, as well as identify potential outliers.

SUBMITTER: Zhang W 

PROVIDER: S-EPMC6046202 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data.

Zhang Wanli W   Di Yanming Y  

PeerJ 20180712


The accumulation of RNA sequencing (RNA-Seq) gene expression data in recent years has resulted in large and complex data sets of high dimensions. Exploratory analysis, including data mining and visualization, reveals hidden patterns and potential outliers in such data, but is often challenged by the high dimensional nature of the data. The scatterplot matrix is a commonly used tool for visualizing multivariate data, and allows us to view multiple bivariate relationships simultaneously. However,  ...[more]

Similar Datasets

| S-EPMC6417818 | biostudies-literature
| S-EPMC4542614 | biostudies-literature
| S-EPMC8938808 | biostudies-literature
| S-EPMC3717481 | biostudies-literature
| S-EPMC6894875 | biostudies-literature
| S-EPMC5009739 | biostudies-literature
| S-EPMC6498510 | biostudies-literature
| S-EPMC7191582 | biostudies-literature
| S-EPMC8123109 | biostudies-literature
| S-EPMC6500068 | biostudies-literature