Unknown

Dataset Information

0

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis.


ABSTRACT:

Background

RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible.

Results

Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis.

Conclusions

High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

SUBMITTER: Cui W 

PROVIDER: S-EPMC7845028 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis.

Cui Weitong W   Xue Huaru H   Wei Lei L   Jin Jinghua J   Tian Xuewen X   Wang Qinglu Q  

Human genomics 20210128 1


<h4>Background</h4>RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes  ...[more]

Similar Datasets

| S-EPMC4393055 | biostudies-literature
| S-EPMC5875907 | biostudies-literature
| S-EPMC3663822 | biostudies-literature
| S-EPMC8214188 | biostudies-literature
| S-EPMC6423143 | biostudies-literature
| S-EPMC4593828 | biostudies-other
| S-EPMC6954399 | biostudies-literature
| S-EPMC6157076 | biostudies-literature
| S-EPMC4670015 | biostudies-literature
| S-EPMC4304217 | biostudies-other