Unknown

Dataset Information

0

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.


ABSTRACT: Variations of gene expression levels play an important role in tumors. There are numerous methods to identify differentially expressed genes in high-throughput sequencing. Several algorithms endeavor to identify distinctive genetic patterns susceptable to particular diseases. Although these processes have been proved successful, the probability that the number of non-differentially expressed genes measured by false discovery rate (FDR) has a large standard deviation, and the misidentification rate (type I error) grows rapidly when the number of genes to be detected become larger. In this study we developed a new method, Unit Gamma Measurement (UGM), accounting for multiple hypotheses test statistics distribution, which could reduce the dependency problem. Simulated expression profile data and breast cancer RNA-Seq data were utilized to testify the accuracy of UGM. The results show that the number of non-differentially expressed genes identified by the UGM is very close to the real-evidence data, and the UGM also has a smaller standard error, range, quartile range and RMS error. In addition, the UGM can be used to screen many breast cancer-associated genes, such as BRCA1, BRCA2, PTEN, BRIP1, etc., provides better accuracy, robustness and efficiency, the method of identification differentially expressed genes in high-throughput sequencing.

SUBMITTER: Liu C 

PROVIDER: S-EPMC6927121 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene.

Liu Chengyou C   Zhou Leilei L   Wang Yuhe Y   Tian Shuchang S   Zhu Junlin J   Qin Hang H   Ding Yong Y   Jiang Hongbing H  

Theoretical biology & medical modelling 20191223 1


Variations of gene expression levels play an important role in tumors. There are numerous methods to identify differentially expressed genes in high-throughput sequencing. Several algorithms endeavor to identify distinctive genetic patterns susceptable to particular diseases. Although these processes have been proved successful, the probability that the number of non-differentially expressed genes measured by false discovery rate (FDR) has a large standard deviation, and the misidentification ra  ...[more]

Similar Datasets

| S-EPMC3500624 | biostudies-literature
| S-EPMC6565503 | biostudies-literature
| S-EPMC4699760 | biostudies-literature
| S-EPMC5988488 | biostudies-literature
| S-EPMC5854297 | biostudies-literature
| S-EPMC4894362 | biostudies-literature
| S-EPMC7870284 | biostudies-literature
| S-EPMC9297917 | biostudies-literature
| S-EPMC7399816 | biostudies-literature
| S-EPMC2194726 | biostudies-literature