Unknown

Dataset Information

0

EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data.


ABSTRACT: RNA sequencing (RNA-Seq) measures genome-wide gene expression. RNA-Seq data is count-based rendering normal distribution models for analysis inappropriate. Normalization of RNA-Seq data to transform the data has limitations which can adversely impact the analysis. Furthermore, there are a few count-based methods for analysis of RNA-Seq data but they are essentially for pairwise analysis of treatment groups or multiclasses but not pattern-based to identify co-expressed genes.We adapted our extracting patterns and identifying genes methodology for RNA-Seq (EPIG-Seq) count data. The software uses count-based correlation to measure similarity between genes, quasi-Poisson modelling to estimate dispersion in the data and a location parameter to indicate magnitude of differential expression. EPIG-Seq is different than any other software currently available for pattern analysis of RNA-Seq data in that EPIG-Seq 1) uses count level data and supports cases of inflated zeros, 2) identifies statistically significant clusters of genes that are co-expressed across experimental conditions, 3) takes into account dispersion in the replicate data and 4) provides reliable results even with small sample sizes. EPIG-Seq operates in two steps: 1) extract the pattern profiles from data as seeds for clustering co-expressed genes and 2) cluster the genes to the pattern seeds and compute statistical significance of the pattern of co-expressed genes. EPIG-Seq provides a table of the genes with bootstrapped p-values and profile plots of the patterns of co-expressed genes. In addition, EPIG-Seq provides a heat map and principal component dimension reduction plot of the clustered genes as visual aids. We demonstrate the utility of EPIG-Seq through the analysis of toxicogenomics and cancer data sets to identify biologically relevant co-expressed genes. EPIG-Seq is available at: sourceforge.net/projects/epig-seq.EPIG-Seq is unlike any other software currently available for pattern analysis of RNA-Seq count level data across experimental groups. Using the EPIG-Seq software to analyze RNA-Seq count data across biological conditions permits the ability to extract biologically meaningful co-expressed genes associated with coordinated regulation.

SUBMITTER: Li J 

PROVIDER: S-EPMC4804494 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

EPIG-Seq: extracting patterns and identifying co-expressed genes from RNA-Seq data.

Li Jianying J   Bushel Pierre R PR  

BMC genomics 20160322


<h4>Background</h4>RNA sequencing (RNA-Seq) measures genome-wide gene expression. RNA-Seq data is count-based rendering normal distribution models for analysis inappropriate. Normalization of RNA-Seq data to transform the data has limitations which can adversely impact the analysis. Furthermore, there are a few count-based methods for analysis of RNA-Seq data but they are essentially for pairwise analysis of treatment groups or multiclasses but not pattern-based to identify co-expressed genes.<h  ...[more]

Similar Datasets

| S-EPMC2194742 | biostudies-literature
| S-EPMC5178351 | biostudies-literature
| S-EPMC5592911 | biostudies-literature
| S-EPMC3381971 | biostudies-literature
| S-EPMC6604381 | biostudies-literature
| S-EPMC9670425 | biostudies-literature
| S-EPMC6284200 | biostudies-literature
| S-EPMC2447744 | biostudies-literature
| S-EPMC8234728 | biostudies-literature
| S-EPMC6573782 | biostudies-literature