Dataset Information

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

ABSTRACT: Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee.

SUBMITTER: Cheng B

PROVIDER: S-EPMC5737654 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

Cheng Bing B Furtado Agnelo A Henry Robert J RJ

GigaScience 20171101 11

Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and c ...[more]

PMID: 29048540

Similar Datasets

Project description:BackgroundPosttranscriptional processing of precursor mRNAs contributes to transcriptome and protein diversity and gene regulatory mechanisms in eukaryotes. However, this posttranscriptional mechanism has not been studied in the marine macroalgae Gracilariopsis lemaneiformis, which is the most cultivated red seaweed species in China.ResultsIn the present study, third-generation sequencing (Pacific Biosciences single-molecule real-time long-read sequencing, SMRT-Seq) was used to sequence the full-length transcriptome of G. lemaneiformis to identify alternatively spliced transcripts and alternative polyadenylation (APA) sites in this species. RNAs were isolated from G. lemaneiformis under various treatments including abiotic stresses and exogenous phytohormones, and then equally pooled for SMRT-Seq. In summary, 346,544 full-length nonchimeric reads were generated, from which 13,630 unique full-length transcripts were obtained in G. lemaneiformis. Compared with the known splicing events in the gene models, more than 3000 new alternative splicing (AS) events were identified in the SMRT-Seq reads. Additionally, 810 genes were found to have poly (A) sites and 91 microRNAs (miRNAs), 961 long noncoding RNAs and 1721 novel genes were identified in G. lemaneiformis. Moreover, validation experiments showed that abiotic stresses and phytohormones could induce some specific AS events, especially intron retain isoforms, cause some alterations to the relative ratios of transcripts annotated to the same gene, and generate novel 3' ends because of differential APA. The growth of G. lemaneiformis was inhibited by Cu stress, while this inhibition was alleviated by ACC treatment. RNA-Seq analysis further revealed that 211 differential alternative splicing (DAS) events and 142 DAS events was obtained in CK vs Cu and Cu vs Cu + ACC, respectively, suggesting that AS of functional genes could be regulated by Cu stress and ACC. Compared with Cu stress, the expression of transcripts with DAS events mainly involved in the carbon fixation in photosynthetic organisms and oxidative phosphorylation pathway was upregulated in Cu + ACC treatment, revealing that ACC alleviated the growth inhibition by Cu stress by increasing carbon fixation and oxidative phosphorylation.ConclusionsOur results provide the first comprehensive picture of the full-length transcriptome and posttranscriptional mechanism in red macroalgae, including transcripts that appeared in the presence of common abiotic stresses and phytohormones, which will improve the gene annotations of Gracilariopsis and contribute to the study of gene regulation in this important cultivated seaweed.

Dataset Information

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

Publications

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets