Unknown

Dataset Information

0

MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification.


ABSTRACT:

Background

Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions.

Results

Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA).

Conclusions

Through model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss .

SUBMITTER: Knight JM 

PROVIDER: S-EPMC4265360 | biostudies-literature | 2014 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification.

Knight Jason M JM   Ivanov Ivan I   Dougherty Edward R ER  

BMC bioinformatics 20141210


<h4>Background</h4>Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions.<h4>Results</h4>Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing da  ...[more]

Similar Datasets

| S-EPMC4818202 | biostudies-other
| S-EPMC6916355 | biostudies-literature
| S-EPMC3753118 | biostudies-literature
| S-EPMC7614421 | biostudies-literature
| S-EPMC8052637 | biostudies-literature
| S-EPMC7891623 | biostudies-literature
| S-EPMC7672693 | biostudies-literature
| S-EPMC7297975 | biostudies-literature
| S-EPMC5031942 | biostudies-literature
| S-EPMC1941756 | biostudies-literature