Unknown

Dataset Information

0

Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach.


ABSTRACT: Biomarker identification by differentially expressed genes (DEGs) using RNA-sequencing technology is an important task to characterize the transcriptomics data. This is possible with the advancement of next-generation sequencing technology (NGS). There are a number of statistical techniques to identify DEGs from high-dimensional RNA-seq count data with different groups or conditions such as edgeR, SAMSeq, voom-limma, etc. However, these methods produce high false positives and low accuracy in presence of outliers. We describe a robust t-statistic method to overcome these drawbacks using both simulated and real RNA-seq datasets. The model performance with 61.2%, 35.2%, 21.6%, 6.9%, 74.5%, 78.4%, 93.1%, 35.2% sensitivity, specificity, MER, FDR, AUC, ACC, PPV, and NPV, respectively at 20% outliers is reported. We identified 409 DE genes with p-values<0.05 using robust t-test in HIV viremic vs avirmeic state real dataset. There are 28 up-regulated genes and 381 down-regulated genes estimated by log2 fold change (FC) approach at threshold value 1.5. The up-regulated genes form three clusters and it is found that 11 genes are highly associated in HIV- 1/AIDS. Protein-protein interaction (PPI) of up-regulated genes using STRING database found 21 genes with strong association among themselves. Thus, the identification of potential biomarkers from RNA-seq dataset using a robust t-statistical model is demonstrated.

SUBMITTER: Akond Z 

PROVIDER: S-EPMC6016759 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Biomarker Identification from RNA-Seq Data using a Robust Statistical Approach.

Akond Zobaer Z   Alam Munirul M   Mollah Md Nurul Haque MNH  

Bioinformation 20180430 4


Biomarker identification by differentially expressed genes (DEGs) using RNA-sequencing technology is an important task to characterize the transcriptomics data. This is possible with the advancement of next-generation sequencing technology (NGS). There are a number of statistical techniques to identify DEGs from high-dimensional RNA-seq count data with different groups or conditions such as edgeR, SAMSeq, voom-limma, etc. However, these methods produce high false positives and low accuracy in pr  ...[more]

Similar Datasets

| S-EPMC4054007 | biostudies-other
| S-EPMC3622290 | biostudies-literature
| S-EPMC3218220 | biostudies-literature
| S-EPMC6360649 | biostudies-literature
| S-EPMC10150572 | biostudies-literature
| S-EPMC3179659 | biostudies-literature
| S-EPMC4992401 | biostudies-literature
| S-EPMC5473255 | biostudies-literature
| S-EPMC4012495 | biostudies-literature
| S-EPMC9649652 | biostudies-literature