Unknown

Dataset Information

0

A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data.


ABSTRACT: BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives. PRINCIPAL FINDINGS: Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term 'progressive construction of contigs' that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data. CONCLUSIONS: We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.

SUBMITTER: Morozova O 

PROVIDER: S-EPMC2527533 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

altmetric image

Publications

A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data.

Morozova Olena O   Morozov Vyacheslav V   Hoffman Brad G BG   Helgason Cheryl D CD   Marra Marco A MA  

PloS one 20080912 9


<h4>Background</h4>Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.<h4>Principal findings</h4>Here we explore the use of seriation, a statistical approach for ordering sets of objects based o  ...[more]

Similar Datasets

| S-EPMC2121609 | biostudies-literature
| S-EPMC3081805 | biostudies-literature
| S-EPMC463327 | biostudies-literature
| S-EPMC526221 | biostudies-literature
| S-EPMC4616010 | biostudies-literature
| S-EPMC517707 | biostudies-literature
| S-EPMC7831185 | biostudies-literature
| S-EPMC8034124 | biostudies-literature
| S-EPMC7672565 | biostudies-literature
| S-EPMC5472114 | biostudies-literature