Population-level annotation of lncRNAs in Arabidopsis thaliana reveals extensive expression and epigenetic variability associated with TE-like silencing [smallRNAseq]
Ontology highlight
ABSTRACT: Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA expression has been shown to be reaching the extent of protein-coding expression and be highly variable between individuals of the same species. Using A. thaliana as a model plant organism, we aimed to understand the true scope of lncRNA transcription across plants from different regions, characterize lncRNA natural expression variability, and study the causes of this variability. Results Using RNA-seq data spanning 499 natural lines and 4 different developmental stages to create a more comprehensive annotation of lncRNAs in A. thaliana, we found over 10,000 novel loci — three times as many as in the current public annotation. We showed that, while lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. It was particularly prominent in intergenic lncRNAs, where TE-like sequences present in 50% of the loci are associated with increased silencing and variation and such lincRNAs tend to be targeted by TE silencing machinery. Conclusion lncRNAs are ubiquitous in the A. thaliana genome, but their expression is highly variable between different lines and tissues. This high expression variability is largely caused by high structural and epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
Project description:Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA expression has been shown to be reaching the extent of protein-coding expression and be highly variable between individuals of the same species. Using A. thaliana as a model plant organism, we aimed to understand the true scope of lncRNA transcription across plants from different regions, characterize lncRNA natural expression variability, and study the causes of this variability. Results Using RNA-seq data spanning 499 natural lines and 4 different developmental stages to create a more comprehensive annotation of lncRNAs in A. thaliana, we found over 10,000 novel loci — three times as many as in the current public annotation. We showed that, while lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. It was particularly prominent in intergenic lncRNAs, where TE-like sequences present in 50% of the loci are associated with increased silencing and variation and such lincRNAs tend to be targeted by TE silencing machinery. Conclusion lncRNAs are ubiquitous in the A. thaliana genome, but their expression is highly variable between different lines and tissues. This high expression variability is largely caused by high structural and epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
Project description:Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA expression has been shown to be reaching the extent of protein-coding expression and be highly variable between individuals of the same species. Using A. thaliana as a model plant organism, we aimed to understand the true scope of lncRNA transcription across plants from different regions, characterize lncRNA natural expression variability, and study the causes of this variability. Results Using RNA-seq data spanning 499 natural lines and 4 different developmental stages to create a more comprehensive annotation of lncRNAs in A. thaliana, we found over 10,000 novel loci — three times as many as in the current public annotation. We showed that, while lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. It was particularly prominent in intergenic lncRNAs, where TE-like sequences present in 50% of the loci are associated with increased silencing and variation and such lincRNAs tend to be targeted by TE silencing machinery. Conclusion lncRNAs are ubiquitous in the A. thaliana genome, but their expression is highly variable between different lines and tissues. This high expression variability is largely caused by high structural and epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
Project description:Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA expression has been shown to be reaching the extent of protein-coding expression and be highly variable between individuals of the same species. Using A. thaliana as a model plant organism, we aimed to understand the true scope of lncRNA transcription across plants from different regions, characterize lncRNA natural expression variability, and study the causes of this variability. Results Using RNA-seq data spanning 499 natural lines and 4 different developmental stages to create a more comprehensive annotation of lncRNAs in A. thaliana, we found over 10,000 novel loci — three times as many as in the current public annotation. We showed that, while lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. It was particularly prominent in intergenic lncRNAs, where TE-like sequences present in 50% of the loci are associated with increased silencing and variation and such lincRNAs tend to be targeted by TE silencing machinery. Conclusion lncRNAs are ubiquitous in the A. thaliana genome, but their expression is highly variable between different lines and tissues. This high expression variability is largely caused by high structural and epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
Project description:Background Long non-coding RNAs (lncRNAs) are under-studied and under-annotated in plants. In mammals, lncRNA expression has been shown to be reaching the extent of protein-coding expression and be highly variable between individuals of the same species. Using A. thaliana as a model plant organism, we aimed to understand the true scope of lncRNA transcription across plants from different regions, characterize lncRNA natural expression variability, and study the causes of this variability. Results Using RNA-seq data spanning 499 natural lines and 4 different developmental stages to create a more comprehensive annotation of lncRNAs in A. thaliana, we found over 10,000 novel loci — three times as many as in the current public annotation. We showed that, while lncRNA loci are ubiquitous in the genome, most appear to be actively silenced and their expression and repressive chromatin levels are extremely variable between natural lines. It was particularly prominent in intergenic lncRNAs, where TE-like sequences present in 50% of the loci are associated with increased silencing and variation and such lincRNAs tend to be targeted by TE silencing machinery. Conclusion lncRNAs are ubiquitous in the A. thaliana genome, but their expression is highly variable between different lines and tissues. This high expression variability is largely caused by high structural and epigenetic variability of non-coding loci, especially those containing pieces of transposable elements. We create the most comprehensive A. thaliana lncRNA annotation to date and improve our understanding of plant lncRNA biology.
Project description:Background: Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes. Results: We annotate granulocyte lncRNAs and mRNAs in RNA-seq data from ten healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at one or more than one month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in 2 independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number. Conclusions: A comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers. We used PolyA+ RNA-seq data from human primary granulocytes of 10 healthy individuals to de novo annotate lncRNAs and mRNAs in this cell type and ribosomal depleted (total) RNA-seq data from seven of these individuals sampled three times to analyze lncRNA amd mRNA expression variability
Project description:Background: Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes. Results: We annotate granulocyte lncRNAs and mRNAs in RNA-seq data from ten healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at one or more than one month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in 2 independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number. Conclusions: A comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers.
Project description:Population-level annotation of lncRNAs in Arabidopsis thaliana reveals extensive expression and epigenetic variability associated with TE-like silencing [lncRNA]
Project description:Long non-coding RNAs (lncRNAs) are defined as non-protein-coding transcripts that are at least 200 nucleotides long. They are known to play pivotal roles in regulating gene expression, especially during stress responses in plants. We used a large collection of in-house transcriptome data from various soybean (Glycine max and Glycine soja) tissues treated under different conditions to perform a comprehensive identification of soybean lncRNAs. We also retrieved publicly available soybean transcriptome data that were of sufficient quality and sequencing depth to enrich our analysis. In total, RNA-seq data of 332 samples were used for this analysis. An integrated reference-based, de novo transcript assembly was developed that identified ~69,000 lncRNA gene loci. We showed that lncRNAs are distinct from both protein-coding transcripts and genomic background noise in terms of length, number of exons, transposable element composition, and sequence conservation level across legume species. The tissue-specific and time-specific transcriptional responses of the lncRNA genes under some stress conditions may suggest their biological relevance. The transcription start sites of lncRNA gene loci tend to be close to their nearest protein-coding genes, and they may be transcriptionally related to the protein-coding genes, particularly for antisense and intronic lncRNAs. A previously unreported subset of small peptide-coding transcripts was identified from these lncRNA loci via tandem mass spectrometry, which paved the way for investigating their functional roles. Our results also highlight the current inadequacy of the bioinformatic definition of lncRNA, which excludes those lncRNA gene loci with small open reading frames (ORFs) from being regarded as protein-coding.
Project description:Population-level annotation of lncRNAs in Arabidopsis thaliana reveals extensive expression and epigenetic variability associated with TE-like silencing
Project description:Population-level annotation of lncRNAs in Arabidopsis thaliana reveals extensive expression and epigenetic variability associated with TE-like silencing [smallRNAseq]