Dataset Information

De novo transcriptome assembly from inflorescence of Orchis italica: analysis of coding and non-coding transcripts.

ABSTRACT: The floral transcriptome of Orchis italica, a wild orchid species, was obtained using Illumina RNA-seq technology and specific de novo assembly and analysis tools. More than 100 million raw reads were processed resulting in 132,565 assembled transcripts and 86,079 unigenes with an average length of 606 bp and N50 of 956 bp. Functional annotation assigned 38,984 of the unigenes to records present in the NCBI non-redundant protein database, 32,161 of them to Gene Ontology terms, 15,775 of them to Eukaryotic Orthologous Groups (KOG) and 7,143 of them to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The in silico expression analysis based on the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) was confirmed by real-time RT-PCR experiments on 10 selected unigenes, which showed high and statistically significant positive correlation with the RNA-seq based expression data. The prediction of putative long non-coding RNAs was assessed using two different software packages, CPC and Portrait, resulting in 7,779 unannotated unigenes that matched the threshold values for both of the analyses. Among the predicted long non-coding RNAs, one is the homologue of TAS3, a long non-coding RNA precursor of trans-acting small interfering RNAs (ta-siRNAs). The differential expression pattern observed for the selected putative long non-coding RNAs suggests their possible functional role in different floral tissues.

SUBMITTER: De Paolo S

PROVIDER: S-EPMC4099010 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

De novo transcriptome assembly from inflorescence of Orchis italica: analysis of coding and non-coding transcripts.

De Paolo Sofia S Salvemini Marco M Gaudio Luciano L Aceto Serena S

PloS one 20140715 7

The floral transcriptome of Orchis italica, a wild orchid species, was obtained using Illumina RNA-seq technology and specific de novo assembly and analysis tools. More than 100 million raw reads were processed resulting in 132,565 assembled transcripts and 86,079 unigenes with an average length of 606 bp and N50 of 956 bp. Functional annotation assigned 38,984 of the unigenes to records present in the NCBI non-redundant protein database, 32,161 of them to Gene Ontology terms, 15,775 of them to ...[more]

PMID: 25025767

Similar Datasets

Project description:BackgroundPolyploidy is very common in plants and can be seen as one of the key drivers in the domestication of crops and the establishment of important agronomic traits. It can be the main source of genomic repatterning and introduces gene duplications, affecting gene expression and alternative splicing. Since fully sequenced genomes are not yet available for many plant species including crops, de novo transcriptome assembly is the basis to understand molecular and functional mechanisms. However, in complex polyploid plants, de novo transcriptome assembly is challenging, leading to increased rates of fused or redundant transcripts. Since assemblers were developed mainly for diploid organisms, they may not well suited for polyploids. Also, comparative evaluations of these tools on higher polyploid plants are extremely rare. Thus, our aim was to fill this gap and to provide a basic guideline for choosing the optimal de novo assembly strategy focusing on autotetraploids, as the scientific interest in this type of polyploidy is steadily increasing.ResultsWe present a comparison of two common (SOAPdenovo-Trans, Trinity) and one recently published transcriptome assembler (TransLiG) on diploid and autotetraploid species of the genera Acer and Vaccinium using Arabidopsis thaliana as a reference. The number of assembled transcripts was up to 11 and 14 times higher with an increased number of short transcripts for Acer and Vaccinium, respectively, compared to A. thaliana. In diploid samples, Trinity and TransLiG performed similarly good while in autotetraploids, TransLiG assembled most complete transcriptomes with an average of 1916 assembled BUSCOs vs. 1705 BUSCOs for Trinity. Of all three assemblers, SOAPdenovo-Trans performed worst (1133 complete BUSCOs).ConclusionAll three assembly tools produced complete assemblies when dealing with the model organism A. thaliana, independently of its ploidy level, but their performances differed extremely when it comes to non-model autotetraploids, where specifically TransLiG and Trinity produced a high number of redundant transcripts. The recently published assembler TransLiG has not been tested yet on any plant organism but showed highest completeness and full-length transcriptomes, especially in autotetraploids. Including such species during the development and testing of new assembly tools is highly appreciated and recommended as many important crops are polyploid.

Dataset Information

De novo transcriptome assembly from inflorescence of Orchis italica: analysis of coding and non-coding transcripts.

Publications

De novo transcriptome assembly from inflorescence of Orchis italica: analysis of coding and non-coding transcripts.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets