Dataset Information

A comparative study of RNA-seq analysis strategies.

ABSTRACT: Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online (https://github.com/boboppie/RSSS) and can be expanded by interested parties to include methods other than the exemplars presented in this article.

SUBMITTER: Janes J

PROVIDER: S-EPMC4652615 | biostudies-literature | 2015 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A comparative study of RNA-seq analysis strategies.

Jänes Jürgen J Hu Fengyuan F Lewin Alexandra A Turro Ernest E

Briefings in bioinformatics 20150318 6

Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to as ...[more]

PMID: 25788326

Similar Datasets

Project description:BackgroundBud dormancy is an important biological phenomenon of perennial plants that enables them to survive under harsh environmental circumstances. Grape (Vitis vinifera) is one of the most grown fruit crop worldwide; however, underlying mechanisms involved in grape bud dormancy are not yet clear. This work was aimed to explore the underlying molecular mechanism regulating bud dormancy in grape.ResultsWe have performed transcriptome and differential transcript expression analyses of "Shine Muscat" grape buds using the Illumina RNA-seq system. Comparisons of transcript expression levels among three stages of dormancy, paradormancy (PD) vs endodormancy (ED), summer buds (SB) vs ED and SB vs PD, resulted in the detection of 8949, 9780 and 3938 differentially expressed transcripts, respectively. Out of approximately 78 million high-quality generated reads, 6096 transcripts were differentially expressed (log2 ratio ≥ 1, FDR ≤ 0.001). Grape reference genome was used for alignment of sequence reads and to measure the expression level of transcripts. Furthermore, findings obtained were then compared using two different databases; Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), to annotate the transcript descriptions and to assign a pathway to each transcript. KEGG analysis revealed that secondary metabolites biosynthesis and plant hormone signaling was found most enriched out of the 127 total pathways. In the comparisons of the PD vs ED and SB vs ED stages of grape buds, the gibberellin (GA) and abscisic acid (ABA) pathways were found to be the most enriched. The ABA and GA pathways were further analyzed to observe the expression pattern of differentially expressed transcripts. Transcripts related to the PP2C family (ABA pathway) were found to be up-regulated in the PD vs ED comparison and down-regulated in the SB vs ED and SB vs PD comparisons. GID1 family transcripts (GA pathway) were up-regulated while DELLA family transcripts were down-regulated during the three dormancy stages. Differentially expressed transcripts (DEGs) related to redox activity were abundant in the GO biological process category. RT-qPCR assay results for 12 selected transcripts validated the data obtained by RNA-seq.ConclusionAt this stage, taking into account the results obtained so far, it is possible to put forward a hypothesis for the molecular mechanism underlying grape bud dormancy, which may pave the way for ultimate improvements in the grape industry.

Project description:Global gene expression in liver transcriptome varies among cattle breeds. The present investigation was aimed to identify the differentially expressed genes (DEGs), metabolic gene networks and metabolic pathways in bovine liver transcriptome of young bulls. In this study, we comparatively analyzed the bovine liver transcriptome of dairy (Polish Holstein Friesian (HF); n = 6), beef (Hereford; n = 6), and dual purpose (Polish-Red; n = 6) cattle breeds. This study identified 895, 338, and 571 significant (p < 0.01) differentially expressed (DE) gene-transcripts represented as 745, 265, and 498 hepatic DE genes through the Polish-Red versus Hereford, Polish-HF versus Hereford, and Polish-HF versus Polish-Red breeds comparisons, respectively. By combining all breeds comparisons, 75 hepatic DE genes (p < 0.01) were identified as commonly shared among all the three breed comparisons; 70, 160, and 38 hepatic DE genes were commonly shared between the following comparisons: (i) Polish-Red versus Hereford and Polish-HF versus Hereford; (ii) Polish-Red versus Hereford and Polish-HF versus Polish-Red; and (iii) Polish-HF versus Hereford and Polish-HF versus Polish-Red, respectively. A total of 440, 82, and 225 hepatic DE genes were uniquely observed for the Polish-Red versus Hereford, Polish-HF versus Hereford, and Polish-Red versus Polish-HF comparisons, respectively. Gene ontology (GO) analysis identified top-ranked enriched GO terms (p < 0.01) including 17, 16, and 31 functional groups and 151, 61, and 140 gene functions that were DE in all three breed liver transcriptome comparisons. Gene network analysis identified several potential metabolic pathways involved in glutamine family amino-acid, triglyceride synthesis, gluconeogenesis, p38MAPK cascade regulation, cholesterol biosynthesis (Polish-Red versus Hereford); IGF-receptor signaling, catecholamine transport, lipoprotein lipase, tyrosine kinase binding receptor (Polish-HF versus Hereford), and PGF-receptor binding, (Polish-HF versus Polish-Red). Validation results showed that the relative expression values were consistent to those obtained by RNA-seq, and significantly correlated between the quantitative reverse transcription PCR (RT-qPCR) and RNA-seq (Pearson's r > 0.90). Our results provide new insights on bovine liver gene expressions among dairy versus dual versus beef breeds by identifying the large numbers of DEGs markers submitted to NCBI gene expression omnibus (GEO) accession number GSE114233, which can serve as useful genetic tools to develop the gene assays for trait-associated studies as well as, to effectively implement in genomics selection (GS) cattle breeding programs in Poland.

Dataset Information

A comparative study of RNA-seq analysis strategies.

Publications

A comparative study of RNA-seq analysis strategies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets