Dataset Information

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.

ABSTRACT: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact.Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT.The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.

SUBMITTER: Li Y

PROVIDER: S-EPMC3622628 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.

Li Yi Y Xie Xiaohui X

BMC bioinformatics 20130410

<h4>Background</h4>RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abund ...[more]

PMID: 23735186

Similar Datasets

Project description:BackgroundCircular RNA (circRNA) is one type of noncoding RNA that forms a covalently closed continuous loop. Similar to long noncoding RNA (lncRNA), circRNA can act as microRNA (miRNA) 'sponges' to regulate gene expression, and its abnormal expression is related to diseases such as atherosclerosis, nervous system disorders and cancer. So far, there have been no systematic studies on circRNA abundance and expression profiles in human adult and fetal tissues.ResultsWe explored circRNA expression profiles using RNA-seq data for six adult and fetal normal tissues (colon, heart, kidney, liver, lung, and stomach) and four gland normal tissues (adrenal gland, mammary gland, pancreas, and thyroid gland). A total of 8120, 25,933 and 14,433 circRNAs were detected by at least two supporting junction reads in adult, fetal and gland tissues, respectively. Among them, 3092, 14,241 and 6879 circRNAs were novel when compared to the published results. In each adult tissue type, we found at least 1000 circRNAs, among which 36.97-50.04% were tissue-specific. We reported 33 circRNAs that were ubiquitously expressed in all the adult tissues we examined. To further explore the potential "housekeeping" function of these circRNAs, we constructed a circRNA-miRNA-mRNA regulatory network containing 17 circRNAs, 22 miRNAs and 90 mRNAs. Furthermore, we found that both the abundance and the relative expression level of circRNAs were higher in fetal tissue than adult tissue. The number of circRNAs in gland tissues, especially in mammary gland (9665 circRNA candidates), was higher than that of other adult tissues (1160-3777).ConclusionsWe systematically investigated circRNA expression in a variety of human adult and fetal tissues. Our observation of different expression level of circRNAs in adult and fetal tissues suggested that circRNAs might play their role in a tissue-specific and development-specific fashion. Analysis of circRNA-miRNA-mRNA network provided potential targets of circRNAs. High expression level of circRNAs in mammary gland might be attributed to the rich innervation.

Dataset Information

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.

Publications

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets