Unknown

Dataset Information

0

A normalization strategy for comparing tag count data.


ABSTRACT:

Background

High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.

Results

We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.

Conclusion

Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.

SUBMITTER: Kadota K 

PROVIDER: S-EPMC3341196 | biostudies-literature | 2012 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

A normalization strategy for comparing tag count data.

Kadota Koji K   Nishiyama Tomoaki T   Shimizu Kentaro K  

Algorithms for molecular biology : AMB 20120405 1


<h4>Background</h4>High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag  ...[more]

Similar Datasets

| S-EPMC3716788 | biostudies-literature
| S-EPMC6980212 | biostudies-literature
| S-EPMC5885979 | biostudies-literature
| S-EPMC4702063 | biostudies-literature
| S-EPMC1906839 | biostudies-literature
| S-EPMC4625728 | biostudies-literature
| S-EPMC7409812 | biostudies-literature
| S-EPMC2748095 | biostudies-literature
| S-EPMC3868625 | biostudies-literature
| S-EPMC10491191 | biostudies-literature